Closed
Bug 444850
Opened 17 years ago
Closed 16 years ago
NSS misbehaves badly in the presence of a disabled PKCS#11 slot
Categories
(NSS :: Libraries, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
3.11.10
People
(Reporter: nelson, Assigned: nelson)
References
Details
Attachments
(6 files, 10 obsolete files)
941 bytes,
patch
|
alvolkov.bgs
:
review+
wtc
:
superreview+
|
Details | Diff | Splinter Review |
1.33 KB,
patch
|
alvolkov.bgs
:
review+
julien.pierre
:
review+
wtc
:
superreview+
|
Details | Diff | Splinter Review |
6.50 KB,
patch
|
julien.pierre
:
review+
wtc
:
superreview+
|
Details | Diff | Splinter Review |
14.44 KB,
patch
|
julien.pierre
:
review+
rrelyea
:
superreview+
|
Details | Diff | Splinter Review |
13.24 KB,
patch
|
rrelyea
:
review+
|
Details | Diff | Splinter Review |
60.07 KB,
patch
|
rrelyea
:
review+
alvolkov.bgs
:
review+
|
Details | Diff | Splinter Review |
In the past month, we've received numerous reports of NSS misbehaving when
it is configured to have Solaris S10's "Crypto Framework" PKCS11 module
(a.k.a. MetaSlot, SoftToken (note: two adjacent T's)). Simply adding that
PKCS#11 module to the secmod.db, and doing nothing else (not putting any certs
or keys into that module, not using that module as the "default" for any mechanisms) is enough to cause the following misbehaviors:
1. The SSL server sends out only its server cert, not the entire cert chain,
even though the entire cert chain is present in the secmod.db and has the
proper and expected trust flags.
2. Received cert chains that are valid and correct are rejected with
"unknown issuer".
3. Attempts to pretty print the contents of a certificate using certutil -Ln
cause certutil to crash.
There are actually many different bugs in NSS revealed by these tests, but
they all have one root cause in common. The PKCS#11 module tells NSS that
it has two slots, both non-removable, and one of them, the second one, is
not present.
This combination of a non-removable, not present, slot/token causes NSS to
mark the slot as disabled during initialization. Being marked as disabled means that NSS does not initialize the token structure associated with the slot, and does not establish a default session for that slot. See
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/pk11wrap/pk11slot.c&rev=1.93&mark=1323-1329,1334#1323
This appears to be as intended and correct. However, thereafter, NSS ignores
these indications that the slot is disabled, and attempts to use the slot
and token anyway. This leads to a multitude of errors, including attempting
to make PKCS#11 API calls on that slot/token, using an invalid session handle
(the value zero).
There is one loop, in particular, that walks through all the "active" slots
(which includes disabled ones, :-( ) trying to look for certs in all those
slots. when it attempts to access one slot with an invalid session number,
and the module correctly rejects that attempt, the loop aborts, reporting
no certs found, even when there were multiple certs found. That is the
immediate cause of most of the symptoms described above.
Having debugged this in detail, I intend to attach a series of patches to
correct various aspects of this. I intend to fix this at various levels,
in a redundant belt-and-suspenders approach. I intend to try to
a) stop NSS from using disabled slots
b) stop NSS from using the invalid session zero with PKCS#11 modules, and
c) get that loop to reports the certs it DID find, even when one or more
slots is unhappy, and
d) not crash in certutil when an attempt to find all certs with a given
subject name fails, even though one such cert is known to exist.
Ideally, I'd like Bob Relyea to review these patches. I'm concerned that
the patches might detect and introduce failures in cases that are intended
to work, despite the obvious problems. But I think we can't wait until
August to release the fixes for these symptoms, so I'll be asking others
to review this in his absence.
Assignee | ||
Comment 1•17 years ago
|
||
Note that, although the platform on which the problem was observed was
Solaris S10, the problems reported here are not specific to that platform
but are in code that is common to all platforms. So, this bug is filed
against ALL/ALL.
Assignee | ||
Comment 2•17 years ago
|
||
certutil finds a cert by nickname, then calls CERT_CreateSubjectCertList
to get a list of certs with that same subject name. At a minimum, the
list should come back with the very same cert from which certutil took
the subject name for the search criteria. But when that function returns
NULL, certutil crashes. This patch fixes that.
I will request review later, after I've attached most of the patches
Assignee | ||
Comment 3•17 years ago
|
||
Among other things, this patch causes nssToken_CreateFromPK11SlotInfo
to not generate a new token object for a disabled slot.
It also avoids a fixes a crash in STAN_InitTokenForSlotInfo that
occurs when nssToken_CreateFromPK11SlotInfo returns NULL
(which it could do before this patch, so this is a real crash fix).
One of my concerns about this patch is that it causes several files
to depend on a new header file for the function PK11_IsDisabled.
I suppose that there is supposed to be some sort of ordering of
dependencies, to prevent circular dependencies, but they aren't
recorded anywhere, and so I don't know if it's OK for files in
lib/dev and lib/pki to depend on pk11pub.h or not.
I hope Bob or Wan-Teh can answer that.
If not, then another option is to put a copy of the disabled bit in
the NSSSlot structure, also.
Assignee | ||
Comment 4•17 years ago
|
||
This patch detects the use of the reserved invalid session number zero
in several places where we blindly accepted it before.
Assignee | ||
Comment 5•17 years ago
|
||
I believe this is the most immediate cause of the problems with
failure to find issuer certs when a slot/token is throwing fits.
Assignee | ||
Comment 6•17 years ago
|
||
Comment on attachment 329182 [details] [diff] [review]
use PK11_IsDisabled to detect and avoid disabled slots, v1
I think I see a problem with this patch for the function
nssTrustDomain_GetActiveSlots. I think this patch makes it leak
NSSSlots that are disabled. :( Will test that.
> for (tp = tokens; *tp; tp++) {
>- slots[count++] = nssToken_GetSlot(*tp);
>+ NSSSlot * slot = nssToken_GetSlot(*tp);
>+ if (!PK11_IsDisabled(slot->pk11slot)) {
>+ slots[count++] = slot;
>+ }
> }
Assignee | ||
Comment 7•17 years ago
|
||
Comment on attachment 329183 [details] [diff] [review]
don't allow (session->handle == CK_INVALID_SESSION), v1
I think this patch is incomplete.
- nssToken_Refresh should be smarter and return failure when
the session handle is zero and when nssSession_ImportNSS3Session
returns NULL.
- All the callers of nssToken_Refresh should pay attention to
its return value, not ignore it.
Assignee | ||
Comment 8•17 years ago
|
||
This seems more complete than the previous version of this patch.
Attachment #329183 -
Attachment is obsolete: true
Assignee | ||
Comment 9•17 years ago
|
||
This version of the patch plugs the leak in the previous version.
Attachment #329182 -
Attachment is obsolete: true
Assignee | ||
Comment 10•17 years ago
|
||
corrected version.
Attachment #329189 -
Attachment is obsolete: true
Assignee | ||
Comment 11•17 years ago
|
||
I have one more concern about these patches. I'm concerned that there are
causes for slots to be disabled temporarily, and this patch might have the
unintended effect of permanently disabling (for the lifetime of a process)
a slot that was intended to be disabled only temporarily.
I think that slots with removable tokens may be disabled while the tokens
are removed. I guess it will take lots of testing to find out.
Too bad we don't have any sort of regression testing for removable tokens.
Assignee | ||
Updated•17 years ago
|
Priority: -- → P1
Assignee | ||
Comment 12•17 years ago
|
||
Comment on attachment 329171 [details] [diff] [review]
fix certutil crash, v1 (checked in on trunk and branch)
This patch is now tested, and as expected, it stops the crash in certutil -L -n Server-Cert.
Attachment #329171 -
Flags: superreview?(wtc)
Attachment #329171 -
Flags: review?(alexei.volkov.bugs)
Assignee | ||
Comment 13•17 years ago
|
||
I am able to reproduce this problem in selfserv now. The failure occurs
while trying to build the server's cert chain to send out to the client.
The stack is:
=>[1] nssTrustDomain_FindCertificatesBySubject(td = 0x133510, subject = 0x1531d8, rvOpt = (nil), maximumOpt = 0, arenaOpt = 0x13c4c0), line 638 in "trustdomain.c"
[2] find_cert_issuer(c = 0x1531a0, timeOpt = (nil), usage = 0xffbff284, policiesOpt = (nil), td = 0x133510, cc = 0x13a870), line 443 in "certificate.c"
[3] nssCertificate_BuildChain(c = 0x1531a0, timeOpt = (nil), usage = 0xffbff3a0, policiesOpt = (nil), rvOpt = (nil), rvLimit = 20U, arenaOpt = (nil), statusOpt = (nil), td = 0x133510, cc = 0x13a870), line 514 in "certificate.c"
[4] NSSCertificate_BuildChain(c = 0x13c680, timeOpt = (nil), usage = 0xffbff3a0, policiesOpt = (nil), rvOpt = (nil), rvLimit = 20U, arenaOpt = (nil), statusOpt = (nil), td = 0x133510, cc = 0x13a870), line 562 in "certificate.c"
[5] CERT_CertChainFromCert(cert = 0x13b598, usage = certUsageSSLServer, includeRoot = 1), line 955 in "certhigh.c"
[6] SSL_ConfigSecureServer(fd = 0x47170, cert = 0x13b598, key = 0x142bb8, kea = ssl_kea_rsa), line 736 in "sslsecur.c"
[7] server_main(listen_sock = 0x47170, requestCert = 0, privKey = 0xffbff5ac, cert = 0xffbff5c0), line 1450 in "selfserv.c"
[8] main(argc = 11, argv = 0xffbff66c), line 2086 in "selfserv.c"
I expect that the "Don't abort" patch above will suffice to cure this particular symptom. I will test that shortly.
Assignee | ||
Comment 14•17 years ago
|
||
Comment on attachment 329184 [details] [diff] [review]
Don't abort search for certs when a slot reports error, v1
As expected, this patch cures the symptom of the server sending out an empty cert chain (containing only the server cert).
Attachment #329184 -
Flags: superreview?(wtc)
Attachment #329184 -
Flags: review?(alexei.volkov.bugs)
Assignee | ||
Comment 15•17 years ago
|
||
Comment on attachment 329190 [details] [diff] [review]
use PK11_IsDisabled to detect and avoid disabled slots, v2
As expected, this patch has the effect of keeping the disabled slot out of the trust domain's token list, which in turn means that NSS will never attempt to search the disabled token/slot for certs. This patch AVOIDS the problem that the previous patch works around.
However, the question is: will this patch have undesired detrimental effects on removable tokens? I'm asking Bob to review it for that reason.
Attachment #329190 -
Flags: superreview?(wtc)
Attachment #329190 -
Flags: review?(rrelyea)
Assignee | ||
Comment 16•17 years ago
|
||
The previous version of this patch was dependent on another patch.
Since I want these patches to be independent, I made this more
complete. This patch now contains one piece also found in another
of these patches.
Also, I found that there were some other bugs in the patched functions,
and I fixed them, too.
I found that, without this patch, the number of calls to find_objects
that attempts to get C_FindObjectsInit to use an invalid (zero) session
handle is staggering!
Attachment #329191 -
Attachment is obsolete: true
Assignee | ||
Comment 17•17 years ago
|
||
This part is low enough risk that it needn't wait for Bob's return.
Attachment #329623 -
Flags: superreview?(wtc)
Attachment #329623 -
Flags: review?(alexei.volkov.bugs)
Assignee | ||
Comment 18•17 years ago
|
||
Comment on attachment 329622 [details] [diff] [review]
don't allow (session->handle == CK_INVALID_SESSION), v4
I want Bob to review this for inobvious issues.
I'm also going to test this on another non-Solaris system that has a removable token.
Attachment #329622 -
Flags: review?(rrelyea)
Comment 19•17 years ago
|
||
Comment on attachment 329171 [details] [diff] [review]
fix certutil crash, v1 (checked in on trunk and branch)
r+ with one comment: should not call PORT_SetError to avoid real error code to be overwritten.
Attachment #329171 -
Flags: review?(alexei.volkov.bugs) → review+
Comment 20•17 years ago
|
||
Comment on attachment 329171 [details] [diff] [review]
fix certutil crash, v1 (checked in on trunk and branch)
r=wtc.
Why do you change the error code to SEC_ERROR_LIBRARY_FAILURE?
First, we'll lose the error code set by CERT_CreateSubjectCertList.
Second, SEC_ERROR_LIBRARY_FAILURE means an NSS internal error (bug),
but CERT_CreateSubjectCertList can fail for other reasons (e.g.,
out of memory).
Attachment #329171 -
Attachment description: fix certutil crash, v1 → fix certutil crash, v1 (wtc disagrees with the SEC_ERROR_LIBRARY_FAILUREerror code)
Attachment #329171 -
Flags: superreview?(wtc) → superreview+
Assignee | ||
Comment 21•17 years ago
|
||
> Why do you change the error code to SEC_ERROR_LIBRARY_FAILURE?
It is not, in general, an error for CERT_CreateSubjectCertList to make no list.
If no certs meet the selection criteria, then making no list is appropriate.
We are not guaranteed that any error code will be set, in general, but in
this case, it is the meaningless SEC_ERROR_IO.
In this code path in certutil, the function is only called when the program
is already in possession of a CERTCertificate for a cert that qualifies.
There's no way that an out-of-memory condition should develop in certutil,
since it is so short lived. The fact that the program already possesses a
CERTCertificate for a cert that meets the criteria means that if
CERT_CreateSubjectCertList returns no list, a software error has occurred.
The program has an existence proof that a certificate meets the criteria and
is known to NSS. If NSS fails to include that cert in the returned list of
certs with that subject, an NSS library has failed to do its job.
Whatever error code is set by the library at that point is irrelevant.
It is actually SEC_ERROR_IO, which is utterly meaningless. (NSS really
should do better at translating PKCS#11 errors into meaningful error codes.)
So, rather than report an error code that will send users and developers off
in pursuit of undomesticated water fowl, I chose to put the blame squarely
where it belongs.
Do you disagree with that?
Comment 22•17 years ago
|
||
Comment on attachment 329184 [details] [diff] [review]
Don't abort search for certs when a slot reports error, v1
r=wtc.
I would not align the '=' signs in the initializations of 'count'
and 'errors' because that's not the prevalent style in this file.
Since 'count' is now used in a larger scope, it would be nice to
give it a more meaningful name such as 'numCerts' or 'collectionCount'.
Attachment #329184 -
Flags: superreview?(wtc) → superreview+
Comment 23•17 years ago
|
||
(In reply to comment #21)
It's not obvious at all. You should add a short comment to explain
the SEC_ERROR_LIBRARY_FAILURE error code.
Comment 24•17 years ago
|
||
Comment on attachment 329190 [details] [diff] [review]
use PK11_IsDisabled to detect and avoid disabled slots, v2
r=wtc.
1. mozilla/security/nss/lib/dev/devslot.c
>+ return (PRBool)!PK11_IsDisabled(slot->pk11slot);
The (PRBool) typecast is not necessary because PK11_IsDisabled
returns PRBool.
2. mozilla/security/nss/lib/pk11wrap/dev3hack.c
>+ /* Don't create a token object for a disabled slot */
>+ if (nss3slot->disabled) {
>+ return NULL;
>+ }
Would be nice to set an error code.
3. mozilla/security/nss/lib/pki/pki3hack.c
> token = nssToken_CreateFromPK11SlotInfo(td, slot);
> PK11Slot_SetNSSToken(slot, token);
>- NSSRWLock_LockWrite(td->tokensLock);
>- nssList_Add(td->tokenList, token);
>- NSSRWLock_UnlockWrite(td->tokensLock);
>+ /* Don't add non-existent token to TD's token list */
>+ if (token) {
>+ NSSRWLock_LockWrite(td->tokensLock);
>+ nssList_Add(td->tokenList, token);
>+ NSSRWLock_UnlockWrite(td->tokensLock);
>+ }
Should we move "PK11Slot_SetNSSToken(slot, token);" into
the if block, too?
4. mozilla/security/nss/lib/pki/trustdomain.c
Should nssTrustDomain_GetActiveSlots free 'slots'
and return NULL if 'count' is 0 at the end of the function?
Attachment #329190 -
Flags: superreview?(wtc) → superreview+
Comment 25•17 years ago
|
||
Comment on attachment 329623 [details] [diff] [review]
lowest risk subset patch to disallow invalid handles (checked in on trunk and branch, with requested changes)
r=wtc.
>+ /* PORT_Assert(session->handle != CK_INVALID_SESSION); */
You should remove this comment.
Attachment #329623 -
Flags: superreview?(wtc) → superreview+
Assignee | ||
Comment 26•17 years ago
|
||
(In reply to comment #24)
Wan-Teh wrote:
> 1. mozilla/security/nss/lib/dev/devslot.c
>
> >+ return (PRBool)!PK11_IsDisabled(slot->pk11slot);
>
> The (PRBool) typecast is not necessary because PK11_IsDisabled
> returns PRBool.
Yes, PK11_IsDisabled returns PRBool, but the returned value is not
the result of PK11_IsDisabled, but rather is the c language "not"
of that, e.g. !PK11_IsDisabled. Is that value a PRBool?
If c compilers all agree that a boolean "not" result is always a PRBool,
then I agree, the cast is unnecessary.
> 2. mozilla/security/nss/lib/pk11wrap/dev3hack.c
>
> >+ /* Don't create a token object for a disabled slot */
> >+ if (nss3slot->disabled) {
> >+ return NULL;
> >+ }
>
> Would be nice to set an error code.
Yeah, I agree. Care to nominate one?
I was too lazy to create one like SEC_ERROR_DISABLED_SLOT
but maybe there already is one that's close enough?
> 3. mozilla/security/nss/lib/pki/pki3hack.c
> Should we move "PK11Slot_SetNSSToken(slot, token);" into
> the if block, too?
I wanted the slot's token pointer value to be updated, even if it
is being updated to NULL.
> 4. mozilla/security/nss/lib/pki/trustdomain.c
>
> Should nssTrustDomain_GetActiveSlots free 'slots'
> and return NULL if 'count' is 0 at the end of the function?
Good question. I need to look into that.
Thanks for the comments.
Comment 27•17 years ago
|
||
you can't convert to prbool by casting, you have to use !! or ! depending. and yes, PRBool is understood to be compatible w/ !! and !.
Comment 28•17 years ago
|
||
PRBool is a synonym of 'int'. It is not an enum
or the built-in bool type of the C++ language.
! applied to a PRBool value is 0 or 1, which is
also a PRBool value. So the (PRBool) cast is
not needed.
Comment 29•17 years ago
|
||
Re: error code for the if (nss3slot->disabled) condition in dev3hack.c
The first issue is whether we should use SEC_ error or NSS_ (Stan)
error here.
If NSS_ error
(http://lxr.mozilla.org/security/source/security/nss/lib/base/errorval.c),
I think NSS_ERROR_INVALID_ARGUMENT, NSS_ERROR_DEVICE_ERROR, or
NSS_ERROR_PKCS11 can be used.
If SEC_ error, I think SEC_ERROR_INVALID_ARGS or SEC_ERROR_NO_TOKEN
can be used.
Setting an approximate or overly generic error code is still better
than using whatever error code is left over from the previous failed
operation.
Assignee | ||
Comment 30•17 years ago
|
||
In reply to comment 28, thanks. I thought that PRBool was an enum.
I think I was confusing it with PRStatus.
Assignee | ||
Comment 31•17 years ago
|
||
Comment on attachment 329184 [details] [diff] [review]
Don't abort search for certs when a slot reports error, v1
seeing second review
Attachment #329184 -
Flags: review?(julien.pierre.boogz)
Assignee | ||
Comment 32•17 years ago
|
||
Comment on attachment 329623 [details] [diff] [review]
lowest risk subset patch to disallow invalid handles (checked in on trunk and branch, with requested changes)
Seeking second review for branch
Attachment #329623 -
Flags: review?(julien.pierre.boogz)
Comment 33•17 years ago
|
||
Comment on attachment 329623 [details] [diff] [review]
lowest risk subset patch to disallow invalid handles (checked in on trunk and branch, with requested changes)
This is fine. I have the same feedback as Wan-Teh here - please delete the commented assertion that this patch adds before check-in.
+ /* PORT_Assert(session->handle != CK_INVALID_SESSION); */
Attachment #329623 -
Flags: review?(julien.pierre.boogz) → review+
Comment 34•17 years ago
|
||
Comment on attachment 329184 [details] [diff] [review]
Don't abort search for certs when a slot reports error, v1
This looks OK, but there are a couple of issues I am wondering about in the code. They are not regressions in the patch.
1) nssToken_FindCertificatesBySubject returns both an array of instances and a status.
But the code only checks the status. It's potentially possible for an array to be returned and leaked. I would suggest to add an assertion that instances is really NULL when we continue in the loop because of status not being PR_SUCCESS.
2)
The whole counting business is a bit confusing. If everything goes right, then count should at most be equal to maximumOpt, but never greater. Yet we allow that case and ignore it.
Again, I would suggest an assertion that ensures that count is <= maximumOpt .
Or we could also perform the count on the number of instances, before we add them to the collection, but that is more code.
Attachment #329184 -
Flags: review?(julien.pierre.boogz) → review+
Updated•17 years ago
|
Attachment #329623 -
Flags: review?(alexei.volkov.bugs) → review+
Assignee | ||
Comment 35•17 years ago
|
||
Comment on attachment 329171 [details] [diff] [review]
fix certutil crash, v1 (checked in on trunk and branch)
I added a comment explaining the library failure error code.
Attachment #329171 -
Attachment description: fix certutil crash, v1 (wtc disagrees with the SEC_ERROR_LIBRARY_FAILUREerror code) → fix certutil crash, v1 (checked in on trunk)
Assignee | ||
Comment 36•17 years ago
|
||
Comment on attachment 329623 [details] [diff] [review]
lowest risk subset patch to disallow invalid handles (checked in on trunk and branch, with requested changes)
I replaced the commented-out assertion, but I intend to put it back in,
(not commented out) when we get the rest of this mess of disabled slots
working better.
Checking in dev/devtoken.c; new revision: 1.49; previous revision: 1.48
Attachment #329623 -
Attachment description: lowest risk subset patch to disallow invalid handles → lowest risk subset patch to disallow invalid handles (checked in on trunk)
Assignee | ||
Comment 37•17 years ago
|
||
Comment on attachment 329171 [details] [diff] [review]
fix certutil crash, v1 (checked in on trunk and branch)
Checking in certutil.c; new revision: 1.139; previous revision: 1.138
Assignee | ||
Comment 38•17 years ago
|
||
I was about to commit v1 of this patch on the trunk when I realized that
there are two nearly identical functions in this source file, and the
existing patch only fixes one of them. This patch fixes both.
It also addresses Wan-Teh's request for a better variable name.
This patch is for the trunk, while the previous patch was for the branch.
I can produce a branch flavor of this patch, if desired.
Attachment #329184 -
Attachment is obsolete: true
Attachment #330715 -
Flags: superreview?(wtc)
Attachment #330715 -
Flags: review?(julien.pierre.boogz)
Attachment #329184 -
Flags: review?(alexei.volkov.bugs)
Assignee | ||
Comment 39•17 years ago
|
||
This is the same as the previous patch, but is for the branch.
This patch should compare to the previous (v1) version of this patch.
Comment 40•17 years ago
|
||
Comment on attachment 330715 [details] [diff] [review]
Don't abort search for certs when a slot reports error, v2 (checked in with requested fix on trunk and branch)
This is OK, but I have the same remarks about nssTrustDomain_FindCertificatesBySubject as I did for the previous patch.
Attachment #330715 -
Flags: review?(julien.pierre.boogz) → review+
Comment 41•17 years ago
|
||
Comment on attachment 330715 [details] [diff] [review]
Don't abort search for certs when a slot reports error, v2 (checked in with requested fix on trunk and branch)
r=wtc.
It seems that we can define a common function that takes
a pointer to nssToken_FindCertificatesByNickname or
nssToken_FindCertificatesBySubject as an argument.
Attachment #330715 -
Flags: superreview?(wtc) → superreview+
Comment 42•17 years ago
|
||
Nelson,
(In reply to comment #34) Julien wrote:
> (From update of attachment 329184 [details] [diff] [review])
>
> The whole counting business is a bit confusing. If everything goes right, then
> count should at most be equal to maximumOpt, but never greater. Yet we allow
> that case and ignore it.
The original code does assume that count <= maximumOpt.
This is why the test in the code below is == 0, not <= 0.
- numRemaining = maximumOpt - count;
- if (numRemaining == 0) break;
To match that, in the new code we should use == or >=
in the following new code.
+ if (collectionCount > maximumOpt)
+ break;
Otherwise, we may continue the loop with numRemaining == 0,
and 0 has a special meaning.
Comment 43•17 years ago
|
||
Comment on attachment 330715 [details] [diff] [review]
Don't abort search for certs when a slot reports error, v2 (checked in with requested fix on trunk and branch)
I withdrew my r= and wait for Nelson's reply to
my comment 42.
Attachment #330715 -
Flags: superreview+ → superreview?(wtc)
Assignee | ||
Comment 44•17 years ago
|
||
I agree the new test should be
+ if (collectionCount >= maximumOpt)
+ break;
Assignee | ||
Comment 45•17 years ago
|
||
Both functions nssTrustDomain_FindCertificatesByNickname and
nssTrustDomain_FindCertificatesBySubject start by getting a list of
"active slots" to search for certs matching the criteria. They get that list
by calling nssTrustDomain_GetActiveSlots. Sadly, the returned list of slots
may contain slots that have been "disabled" for various reasons, and/or have
no tokens present. NSS would have been unable to establish any session with
such slots, and would have only an invalid session handle for them. Yet
despite these obvious disqualifications, these slots have been placed into
the "active slots" list. Consequently, when NSS attempts to search those
slots, it gets errors about using an invalid session handle.
Ideally, I'd really like to keep all slots for which no valid session handle
exists out of the active slots list completely. This would be very easy to
do, and the existing patches do it, except for the fact that the state of
being "disabled" is NOT always permanent (for the lifetime of a process).
Slots with removable tokens are also "disabled" while the tokens are absent.
We want to be sure that such slots ARE treated as "active slots" when a
token is (re)inserted. The concern is that, if we remove all slots that are
disabled from the active token list, perhaps slots with removable tokens
will not be utilized once they are reinserted.
This is part of the price we pay for lack of design documentation.
It's not clear how slots with removable tokens are supposed to work.
The choices seem to be:
a) keep disabled slots out of the active slot list, but update that list
when a token is inserted into a slot, or
b) keep disabled slots in the active slot list, but avoid actually trying
to use them when walking down the slot list, if their tokens are removable
and absent.
Comment 46•17 years ago
|
||
Comment on attachment 330715 [details] [diff] [review]
Don't abort search for certs when a slot reports error, v2 (checked in with requested fix on trunk and branch)
r=wtc. Please remember to change the new test in
nssTrustDomain_FindCertificatesByNickname to
if (collectionCount >= maximumOpt). The new test
in nssTrustDomain_FindCertificatesBySubject is
correct.
Attachment #330715 -
Attachment description: Don't abort search for certs when a slot reports error, v2 (for trunk) → Don't abort search for certs when a slot reports error, v2 (for trunk) (wtc: use collectionCount >= maximumOpt in one of the new tests)
Attachment #330715 -
Flags: superreview?(wtc) → superreview+
Assignee | ||
Comment 47•17 years ago
|
||
Comment on attachment 330715 [details] [diff] [review]
Don't abort search for certs when a slot reports error, v2 (checked in with requested fix on trunk and branch)
Checking in lib/pki/trustdomain.c; new revision: 1.58; previous revision: 1.57
Attachment #330715 -
Attachment description: Don't abort search for certs when a slot reports error, v2 (for trunk) (wtc: use collectionCount >= maximumOpt in one of the new tests) → Don't abort search for certs when a slot reports error, v2 (for trunk) (checked in with requested fix)
Assignee | ||
Comment 48•17 years ago
|
||
Comment on attachment 329622 [details] [diff] [review]
don't allow (session->handle == CK_INVALID_SESSION), v4
This patch has the effect that the slot->nssToken object is not created
during initialization if the removable token is not present. That would
be OK, in my opinion, if it got created at a later time, such as when
the token is inserted into the slot. But that doesn't happen.
STAN_InitTokenForSlotInfo only gets calls during initialization, not
when a token gets plugged in. So, with this patch, we see strange crashes
when we attempt to authenticate to a token that has been plugged in after
initialization. NSS gets quite far, and even successfully authenticates
to the underlying token, before NSS discovers that the nssToken object is
missing (NULL).
There seems to be a fundamental choice to make here. Either
a) the slot->nssToken object always gets created during initialization,
even when there is no token in the slot, and we deal with all the problems
that can occur when the session pointers in that token object are NULL, or
b) we allow slot->nssToken to be null if no token is present during
initialization (as this patch does now), and we change NSS to create the
nssToken object when the token is inserted.
I am presently working on a patch to do the former.
I find that I must test all patches alternately on two systems,
1) a system with removable tokens, and
2) a system with a slot that gets marked disabled during startup.
Unfortunately for me, the two systems I have for those purposes have
different types of CPUs, different OSes, and one is only accessible to
me through a VPN that only works sometimes.
Attachment #329622 -
Attachment is obsolete: true
Attachment #329622 -
Flags: review?(rrelyea)
Assignee | ||
Comment 49•17 years ago
|
||
Comment on attachment 329190 [details] [diff] [review]
use PK11_IsDisabled to detect and avoid disabled slots, v2
The separation of this patch from the patch to detect and disallow
session objects with invalid session handles was and is artificial.
It seems like these should be separable, but so far, it has been
necessary to always update both at the same time. So, going
forward, I will just combine the two into a bigger patch.
Attachment #329190 -
Attachment is obsolete: true
Attachment #329190 -
Flags: review?(rrelyea)
Assignee | ||
Comment 50•17 years ago
|
||
This patch (for the trunk) works on both my Windows system with removable
tokens, and on a S10 system with the module that reports absent non-removable
tokens. I've restored the assertion that detects an attempt to call
find_objects with an invalid (zero) session handle, and the assertion is not
hit on either system (a big improvement over the existing code).
This patch is one of two alternatives I'm contemplating. The other would
not create a token object when no token is present during initialization,
but would create that object later, if/when it was inserted. I think that's
cleaner, but it might be a much bigger change.
I found a sequence of operations that caused a crash with the previous
version of this patch, but does not cause a crash with this patch. I will
record it here for posterity. It is:
1) start the browser with the slot empty
2) insert the token
3) start a login to the token (using the device manager)
4) enter a wrong password, then
5) on the second try, cancel. You will get a dialog saying login failed.
6) start another login using the device manager
7) enter the correct password
Boom.
I don't know what was special about that sequence of operations, but without
the failed attempts, entering the correct password worked as expected.
Assignee | ||
Comment 51•17 years ago
|
||
Comment on attachment 331456 [details] [diff] [review]
create token object, even when there is no session, for trunk, v5
In this patch, I made the changes previously suggested by Wan-Teh.
I set an error code, using PORT_SetError, when the code detects an
attempt to use a disabled slot, and I discard an empty slot list
in nssTrustDomain_GetActiveSlots, rather than returning an empty list.
That code path should probably set an error code, too.
I chose to use PORT_SetError rather than an NSS error code because
I'm not sure that any code actually pays attention to NSS error codes
anywhere. I wanted to be sure that the error code would actually be
noticed by callers of NSS, if they call PR_GetError. If setting an
NSS error code also assures that outcome, I should consider that,
but I don't believe it does.
Attachment #331456 -
Flags: superreview?(rrelyea)
Attachment #331456 -
Flags: review?(wtc)
Assignee | ||
Comment 52•17 years ago
|
||
This patch is the same as the previous one, but it ignores whitespace.
It may be easier to review this copy than the the previous one.
(It's a shame that bugzilla's diff viewing tool can't do this with regular
patches.)
Assignee | ||
Comment 53•17 years ago
|
||
Comment on attachment 329171 [details] [diff] [review]
fix certutil crash, v1 (checked in on trunk and branch)
cmd/certutil/certutil.c; new revision: 1.97.2.14; previous revision: 1.97.2.13
Attachment #329171 -
Attachment description: fix certutil crash, v1 (checked in on trunk) → fix certutil crash, v1 (checked in on trunk and branch)
Assignee | ||
Comment 54•17 years ago
|
||
Comment on attachment 329623 [details] [diff] [review]
lowest risk subset patch to disallow invalid handles (checked in on trunk and branch, with requested changes)
lib/dev/devtoken.c; new revision: 1.39.28.2; previous revision: 1.39.28.1
Attachment #329623 -
Attachment description: lowest risk subset patch to disallow invalid handles (checked in on trunk) → lowest risk subset patch to disallow invalid handles (checked in on trunk and branch, with requested changes)
Assignee | ||
Updated•17 years ago
|
Attachment #330716 -
Attachment is obsolete: true
Assignee | ||
Comment 55•17 years ago
|
||
Comment on attachment 330715 [details] [diff] [review]
Don't abort search for certs when a slot reports error, v2 (checked in with requested fix on trunk and branch)
lib/pki/trustdomain.c; new revision: 1.51.28.6; previous revision: 1.51.28.5
Attachment #330715 -
Attachment description: Don't abort search for certs when a slot reports error, v2 (for trunk) (checked in with requested fix) → Don't abort search for certs when a slot reports error, v2 (checked in with requested fix on trunk and branch)
Assignee | ||
Updated•17 years ago
|
Attachment #331456 -
Flags: review?(julien.pierre.boogz)
Comment 56•17 years ago
|
||
Comment on attachment 331522 [details] [diff] [review]
create token object, even when there is no session, for trunk, v5 (IGNORING WHITESPACE)
1) There is a problem in the following code sequence :
nssSession_ExitMonitor(session);
/* token notr emoved, finished */
if (session->handle != CK_INVALID_SESSION)
return PR_TRUE;
The issue is that session->handle is being read after the monitor on the session has been released. The session handle may therefore have changed due to another thread operating on the default session. The handle check needs to happen before the monitor is released.
This is a problem in existing code, but I feel that it needs to be fixed in this patch, since it rewrites some of the code that contained this bug.
2) in find_objects, doesn't somebody need to acquire the session monitor also, in order to check session ?
There is an existing if check for session->handle, which is done outside the monitor ownership. That seems wrong, because again, session->handle could change when another thread acquires the monitor. And therefore, the new assertion would also be in the wrong place.
Attachment #331522 -
Flags: review-
Comment 57•17 years ago
|
||
Note that I reviewed the whole patch and the rest looks fine to me.
Comment 58•17 years ago
|
||
Comment on attachment 331456 [details] [diff] [review]
create token object, even when there is no session, for trunk, v5
See my comments on the same patch "IGNORING WHITESPACE".
Attachment #331456 -
Flags: review?(julien.pierre.boogz) → review-
Assignee | ||
Comment 59•17 years ago
|
||
Julien, I believe the session monitor protects the opaque objects inside
the PKCS#11 module that are identified by the session handle, objects such
as the contexts of multi-part hash or encryption operations that may be
undertaken on a session (using a session handle) in the module.
I perceive that, as used in NSS, the session monitor *does not protect* the
contents of an nssSession object, but rather protects the contents of the
underlying PKCS#11 module's data storage associated with the session handle
held in that object.
I think you will find that the monitor is typically held only around the
calls to one (or a few) PKCS#11 API functions that use a session handle as
an argument, and is not held generally around all code that uses an
nssSession object.
If the monitor was intended to protect the nssSession object itself, one
might imagine that functions such as nssToken_GetDefaultSession, that return
pointers to these objects, would acquire those monitors. But they do not.
I don't mind changing nssSlot_IsTokenPresent in this patch, as you suggested,
so that the decision of whether or not the handle is invalid is made while
holding the session monitor, and will do so. But I think that the questions
you're raising are, essentially:
- what all is protected by the monitor in the nssSession object? and
- can session->handle be referenced without holding that monitor?
and implicitly
- are nssSession objects adequately protected by the current code?
If the answer to this last question is negative, then I think the scope and
magnitude of the necessary changes are MUCH greater than the scope and
magnitude of this patch, and I would say that this patch should not be held
up while those larger changes are contemplated.
Comment 60•17 years ago
|
||
Nelson,
Re: comment 59,
I am not sure that the purpose you state forthe session monitor (which may be correct) is the only one.
If this monitor does not protect the content of an nssSession object, then what does ?
If you look at code from nssSlot_IsTokenPresent in the diff for the patch, you will see the following existing assignment :
session->handle = CK_INVALID_SESSION;
This assignment is done while the monitor is held. Is that accidental or intentional ?
In other places, for example in find_objects, there is code that reads session->handle . It tries to check that it's valid before it goes to go a PKCS#11 call with that handle. But it doesn't check after acquiring the monitor if that session handle is still valid.
So, what I see there is a race - the session handle can be invalidated between the time it has been checked and the time it's being used to make a call.
Is there some other higher-level global lock I am missing that prevents this race from actually happening ?
So far, I am not seeing it. I think protecting the content of the object was one of the purposes of the nssMonitor, but it seems to have been applied only to writes and not to reads.
Assignee | ||
Comment 61•17 years ago
|
||
> If this monitor does not protect the content of an nssSession object, then
> what does ?
Good question. Maybe nothing. I think that if you survey all the code
that uses that object, you will find that most of it uses nothing.
If that is the case, then that may need to be fixed, but it is beyond the
scope of this bug.
> This assignment is done while the monitor is held. Is that accidental or
> intentional ?
My guess is that it is essentially accidental. I don't think it indicates
that the monitor is used for that purpose in a systematic way throughout
NSS. But I'd be happy to be wrong.
I'm going way out on a limb here, and *guessing* that the objects that
represent modules, slots, PK11Slots, and nssSessions simply aren't
systematically locked when used. I suspect the rationale is something like
this: Most of these things are entirely static, setup once during init,
and then effectively read-only after that. No race occurs while they are
being initialized, and there is no need to lock them while they are in that
read-only state.
Of course, removable tokens aren't quite so static. :)
Take a look at the functions that call nssTrustDomain_GetActiveSlots, such as nssTrustDomain_FindCertificatesByNickname,
nssTrustDomain_FindCertificatesBySubject,
(which my patch improves), and also
nssTrustDomain_FindCertificateByIssuerAndSerialNumber
NSSTrustDomain_TraverseCertificates and
NSSTrustDomain_TraverseUserCertificates
nssTrustDomain_FindTrustForCertificate
nssTrustDomain_FindCRLsBySubject
(which all need to be fixed in the same way as nssTrustDomain_FindCertificatesBySubject was, I think :-/ )
and the functions they call, such as nssTrustDomain_GetActiveSlots and nssTrustDomain_GetSessionForToken
As those functions get and traverse lists of slots and tokens, calling
nssTrustDomain_GetSessionForToken for each token, none of them appear to do
any locking of any of those objects. I'm not saying that's correct or good,
but I'm saying that if we're going to change it, we must do that in a
systematic way that affects all the relevants parts of NSS at once. Until
such time as we decide to do that, I think we should not attempt to impose a
new set of locking requirements on patches that fix bugs in the existing code.
Assignee | ||
Comment 62•17 years ago
|
||
This is the same as v5, except for a tiny difference in PK11_IsTokenPresent
suggested by Julien.
Attachment #331456 -
Attachment is obsolete: true
Attachment #331522 -
Attachment is obsolete: true
Attachment #331699 -
Flags: superreview?(rrelyea)
Attachment #331699 -
Flags: review?(wtc)
Attachment #331456 -
Flags: superreview?(rrelyea)
Attachment #331456 -
Flags: review?(wtc)
Comment 63•17 years ago
|
||
OK, Just some answers to the questions in the comments....
RE pk11pub.h in lib/dev and lib/pki --- It's not a layer violation. PK11_wrap is logically below the stan code. They only issue is a pure stan implementation should have a disable at the stan level. Your idea of creating a stan "is it disabled" is probably most correct, though there isn't anything wrong with what you actually did.
RE temporary disable. The ony way to temporarily disable a slot is with the user functions PK11_UserDisableSlot() and PK11_UserEnableSlot(). I don't know of any applications that currently use them, but for correctness anything we do to the disabled slot in the normal init path, we should also do with PK11_UserDisableSlot() and we should undo with PK11_UserEnableSlot().
Looking at the code, there appears to be a bug in PK11_UserEnableSlot(). It will enable a slot even if it was disabled 'automatically'. It should probably only renable slots that were disabled by the user.
NOTE:I have not yet looked at the patches, but disables slots should still show up in queries for all the slots in the system. In short, mozilla should still show the slots in it's device UI.
RE: locking of session handles. -- the global session handle is purposefully not protected by a lock. It's considered an 'invariant' in the code even though it can change. That change is always triggered by a token removal. In all those cases all the existing operations on that token are expected to fail or not more or less randomly. In general you cannot lock the validity of a session handle because you can't prevent the user from invalidating it by pulling the token.
I'll go look at the patches themselves now....
bob
Comment 64•17 years ago
|
||
re my comments on session handles.
Those comments apply to slot->sessionHandles, which are integers. We may in fact have to look more closely at the stan session objects, if we don't at least hold a reference to them...
bob
Comment 65•17 years ago
|
||
Bob,
Re: comment 63 and comment 64,
About session handles - are you saying that it doesn't matter if the value of the value of the session handle is changed randomly by another thread in the middle of the execution of those functions ?
If so, then we should remove all the code that checks that the handle is not CK_INVALID_HOME, as well as the assertions.
I can see how we can never ensure that the session handle is valid - the session of course becomes invalid the moment the token is unplugged - but that's different from the handle to the session being changed.
Comment 66•17 years ago
|
||
I meant CK_INVALID_SESSION . sigh.
Comment 67•17 years ago
|
||
Comment on attachment 331699 [details] [diff] [review]
create token object, even when there is no session, for trunk, v6 (checked in)
r+
With the following observations...
I moving the session handle check inside the monitor isn't necessary, but doesn't hurt (so it's ok to keep it).
I'm more worred about the lifetime of the session object itself. It doesn't appear to be ref-counted as I would have expected.
in line 1207 the added crv = appears to be redundant (that is it is set but never checked). I believe this semantic is correct (It doesn't seem right to necessarily fail a token just because it's random number generator returned an error).
I believe this code will function correctly WRT PK11_UserDisableSlot()/PK11_UserEnableSlot()
since the slot left out of the trust domain on init. A user disabled slot will remain on the trust domain, but will disappear for any Active list. It would show up again when it's turned on.
I see some things that should probably be looked at namely:
1) life cycle of session objects.
2) Only renabling User Disabled tokens.
both of which are pre existing conditions so should not hold up this patch.
bob
Attachment #331699 -
Flags: superreview?(rrelyea) → superreview+
Assignee | ||
Comment 68•17 years ago
|
||
Comment on attachment 331699 [details] [diff] [review]
create token object, even when there is no session, for trunk, v6 (checked in)
Thanks, Bob,
I expect (and hope) there will be some more discussion about all this.
You're right that the addition of "crv =" at line 1212 in PK11_InitToken
didn't actually change the essential behavior of the function any.
It just allowed me to see what value was being returned by PKCS#11 modules
for which the source was not handy (so, stepping into the function wasn't
possible), while debugging, and set breakpoints conditioned on those return
values. I suspected that this function was returning errors, and sure enough.
>- PK11_GETTAB(slot)->C_SeedRandom(slot->session,
>+ crv = PK11_GETTAB(slot)->C_SeedRandom(slot->session,
> random_bytes, sizeof(random_bytes));
One more question. I wanted to put the assertion that the session handle
was not zero into find_objects. I really want us to stop asking modules
to use invalid handles. I figure that if NSS is working correctly, we'll
never this this assertion, and if it's not, we'll find out and fix it.
But there is some concern that perhaps NSS has more ways of using invalid
session handles than we know about, and this assertion might be perceived
as a reliability regression for FF3, especially in developers' builds.
So, what do you think of the advisability of that assertion?
Assignee | ||
Comment 69•17 years ago
|
||
Bob, even though you've SR'ed my latest patch for this bug, I'd still like to
explore alternatives with you.
One of the things that bothers me about this patch is that, with it, there is
ALWAYS an nssToken object allocated and present, even when there is no token
in the slot. The object sits there, with many flags and variables that purport
to describe the state of an object that does not exist. This seems like a
recipe for lots of bugs (like the ones this bug report is trying to fix).
I would prefer that there be NO nssToken object when the slot is empty. But I
tried to make that happen, and found that there were lots of problems with it.
In particular, there seems to be no code path to create an nssToken except
during module initialization.
I had a patch that deleted the nssToken when the token was removed. In
testing it, I was surprised to find that NSS was able to go so far as to
actually login to the token, even though there was no nssToken object present.
This suggests to me that some (perhaps much) of the information that is needed
to interact with a token is not stored in the right object. It is possible to
login to a token in the absence of an nssToken object because the data needed
to do so is not stored in the nssToken object, but rather is somewhere else.
Do you agree, in principle, with the idea that nssToken objects should not
exist when the slot is empty, and with the idea that it ought not to be
possible to interact meaningfully with a token without having an nssToken object?
Comment 70•17 years ago
|
||
re NULL session handle assert. I'm of two minds... one, we shouldn't be calling find will a null session handle, but 2 It's good to call the module once in a while with a NULL session handle. PKCS #11 modules are supposed to handle that correctly (e.i. return invalid session), and we do use invalid session handle internally to signal error states. I can't think of a case, however, when we should call find with such a handle, so the assert is probably good.
RE nssToken object. Well what you describe is exactly what was supposed to happen. Token objects were only suppose to exist if the token was present, and was supposed to be deleted when the token was removed. Ian evidently didn't quite implement it that way, or at least he didn't in the non-pure Stan case.
nssToken objects are stan objects, so it's not surprising you make it quite a ways before you notice the lack. All the old wrapper code still uses the PK11SlotInfo object as the basic knowledge of the slot/token. In Stan that task is split between the nssSlot and nssToken objects.
Part of the problem may be in the 'mixed' code Ian slaved the nssToken object to the PK11SlotInfo rather than the nssSlot.
Anyway, I agree with your principle.
bob
Comment 71•17 years ago
|
||
Bob,
I agree the assert is good, but only as long as we assert on the handle that will actually be used for the PKCS#11 call that follows. Because nssSession isn't locked, that's not necessarily the case today, as session->handle can change during the test or assert, and the actual PKCS#11 call. The problem can be solved in several ways :
1) short-term simple fix : read session->handle only once, to a local variable, and then test/assert on that local variable, and then use that local variable for the handle passed to the PKCS#11 call
2) longer-term : properly lock nssSession so that handle is properly protected .
This is a pre-existing problem that shouldn't hold up the patch for this bug. There should probably be a separate bug, at this point.
Assignee | ||
Comment 72•17 years ago
|
||
Comment on attachment 331699 [details] [diff] [review]
create token object, even when there is no session, for trunk, v6 (checked in)
Seeking second review for branch.
Attachment #331699 -
Flags: review?(julien.pierre.boogz)
Updated•17 years ago
|
Attachment #331699 -
Flags: review?(julien.pierre.boogz) → review+
Comment 73•17 years ago
|
||
re locking the session handle. (long term)
So we shouldn't have to lock the session handle, but we should make the session handle object reference counted.
the session handle within the session handle object should be invariant.
bob
Assignee | ||
Comment 74•17 years ago
|
||
On trunk
lib/dev/devslot.c; new revision: 1.24; previous revision: 1.23
lib/dev/devtoken.c; new revision: 1.50; previous revision: 1.49
lib/pk11wrap/dev3hack.c; new revision: 1.24; previous revision: 1.23
lib/pk11wrap/pk11slot.c; new revision: 1.94; previous revision: 1.93
lib/pki/pki3hack.c; new revision: 1.96; previous revision: 1.95
lib/pki/trustdomain.c; new revision: 1.59; previous revision: 1.58
Assignee | ||
Updated•17 years ago
|
Attachment #331699 -
Flags: review?(wtc)
Assignee | ||
Comment 75•17 years ago
|
||
On Branch:
dev/devslot.c; new revision: 1.22.2.2; previous revision: 1.22.2.1
dev/devtoken.c; new revision: 1.39.28.3; previous revision: 1.39.28.2
pk11wrap/dev3hack.c; new revision: 1.21.28.2; previous revision: 1.21.28.1
pk11wrap/pk11slot.c; new revision: 1.87.2.3; previous revision: 1.87.2.2
pki/pki3hack.c; new revision: 1.86.28.7; previous revision: 1.86.28.6
pki/trustdomain.c; new revision: 1.51.28.7; previous revision: 1.51.28.6
Watching tinderbox now.
Assignee | ||
Comment 76•17 years ago
|
||
In comment 61, I wrote that there was a list of functions that call
nssTrustDomain_GetActiveSlots and all need to be fixed. Some of them
are now fixed, and some are not.
> nssTrustDomain_FindCertificatesByNickname, (now fixed)
> nssTrustDomain_FindCertificatesBySubject, (now fixed)
> nssTrustDomain_FindCertificateByIssuerAndSerialNumber (still buggy)
> NSSTrustDomain_TraverseCertificates (still buggy)
> NSSTrustDomain_TraverseUserCertificates (still buggy)
> nssTrustDomain_FindTrustForCertificate (still buggy)
> nssTrustDomain_FindCRLsBySubject (still buggy)
I wonder if it makes sense to consider this bug fixed while some functions
remain that continue to abort their loops at the first bad slot they
encounter. I rather doubt it. :(
Assignee | ||
Comment 77•17 years ago
|
||
Assignee | ||
Comment 78•16 years ago
|
||
I am trying to determine how to test the patches to each of the following
functions:
> nssTrustDomain_FindCertificateByIssuerAndSerialNumber
- called when verifying a signature with the old PKCS7 library
sec_pkcs7_verify_signature -> CERT_FindCertByIssuerAndSN
- MAY be called to verify CMS signature in libSMIME
NSS_CMSSignerInfo_GetSigningCertificate -> CERT_FindCertByIssuerAndSN
> NSSTrustDomain_TraverseCertificates
called from CERT_GetCertNicknames
called from NSS_GetClientAuthData
called from CERT_FindUserCertsByUsage
called from CERT_MatchUserCert (dead)
called from various PSM functions
called from PK11_TraverseSlotCerts
called from CERT_GetSSLCACerts
used in signtool commands to list certs
called from PK11_ListCerts
used by certutil -L (among many others)
> NSSTrustDomain_TraverseUserCertificates
No need to test. It's dead code, being removed.
> nssTrustDomain_FindTrustForCertificate
called from nssTrust_GetCERTCertTrustForCert
called from fill_CERTCertificateFields
called from stan_GetCERTCertificate
called from STAN_ChangeCertTrust
called from CERT_ChangeCertTrust
used in certutil -A, -M
called from __CERT_AddTempCertToPerm
> nssTrustDomain_FindCRLsBySubject
called from PK11_FindCrlByName
called from SEC_FindCrlByKeyOnSlot
called from crl_storeCRL
called from PK11_ImportCRL
called from SEC_NewCrl
called from CERT_ImportCRL
called from SECU_StoreCRL used in crlutil
called from crlutil's ImportCRL
Assignee | ||
Comment 79•16 years ago
|
||
Bob Relyea, please read this command and add your thoughts.
Bug 444974 was experiencing the same problems described in comment 0 above.
Some problem with a PKCS#11 module was causing the loops in trustdomain.c
to terminate prematurely, and this occurred only when the smart card was
plugged into the reader for the third-party PKCS#11 module.
Then the reporter tried FF 3.0.2 build 6, which incorporates NSS 3.12.1.1,
and his problem deteriorated from a mere "unknown issuer" error to a crash.
The problem is that there are many functions in nss/lib/dev/devtoken.c that
assume that the "defaultSession" field of the NSSToken structure will never
be NULL. We see this line of code in 9 places in that file:
nssSession *session = (sessionOpt) ? sessionOpt : token->defaultSession;
Thereafter, the code blindly assumes that session cannot be NULL.
We could change those functions to test session and try to do something
else if it is NULL. But I wonder: why are these functions even getting
called in that case? When we have no default session for a token, this
pretty much means the token is useless to us. So it seems to me we
should detect that at a high level and stop trying to use that token while
that condition is trie.
token->defaultSession is assigned in just two places, both in
nss/lib/pk11wrap/dev3hack.c. One in nssToken_CreateFromPK11SlotInfo and
one in nssToken_Refresh. Both places assign to it the value returned
from a call to nssSession_ImportNSS3Session. That function takes a
CK_SESSION_HANDLE argument, and returns a NULL value if that handle is 0,
or if the malloc fails. The most likely explanation for that function
to return NULL is that the CK_SESSION_HANDLE passed to it was the magic
value CK_INVALID_SESSION, which is 0.
This begs the question, why are the callers of nssSession_ImportNSS3Session
passing it CK_INVALID_SESSION? Why are they even trying to do what they're
doing with an unusable token? Should those callers do something else instead?
and if so, what?
The BIG question I have is:
- Should we allow an NSSToken to have a null defaultSession pointer, and
require that every function that tries to use it test it for NULL? *OR*
- Should we disallow NSSTokens to have null defaultSession pointers, and
refuse to create them, or destroy them, when that value is NULL?
Disallowing NSSTokens with null defaultSessions would avoid getting into the functions that would try to use the null defaultSession in the first place.
Those functions will generally not be called with a NULL NSSToken pointer.
However, if we destroy the NSSToken structure when the default session goes
to NULL, then we have to re-create it when a token is plugged in, something
that NSS does not do now. But I think that is the right solution.
Bob, I'd like your help in figuring out how to solve this problem.
Assignee | ||
Updated•16 years ago
|
Attachment #331699 -
Attachment description: create token object, even when there is no session, for trunk, v6 → create token object, even when there is no session, for trunk, v6 (checked in)
Assignee | ||
Comment 80•16 years ago
|
||
This link shows all the references to token->defaultSession. All the places
that assume that it is non-NULL are in the list from this search.
http://mxr.mozilla.org/security/search?string=defaultSession&case=on&find=nss%2Flib&findi=&filter=%5E%5B%5E%5C0%5D*%24&hitlimit=&tree=security
No token object should exist with a NULL defaultSession. IMO, the right
fix is to ensure that token objects do not exist with NULL defaultSessions.
That fix will, in general, avoid any of these functions from being called.
Assignee | ||
Comment 81•16 years ago
|
||
The code that should first detect that a token has no session includes the
function nssToken_CreateFromPK11SlotInfo, and all the callers of
nssToken_Refresh, which are listed at
http://mxr.mozilla.org/security/ident?i=nssToken_Refresh
They should destroy the token object if no session can be created for it.
I have previously developed a patch to do just that.
The unresolved issue with that patch was that it necessitates a new call to
nssToken_CreateFromPK11SlotInfo that would occur when a token is
(re)inserted into the slot. The existing code assumes that the token object
is always present.
Comment 82•16 years ago
|
||
re comment 79:
those are good questions. In the idealized stan design the token object itself wouldn't exist. The hybrid is definitively causing problems. I think the ideal fix would be to make the PK11SlotInfo * object point to the nssslot which would then point the the nsstoken. Unfortunately doing that surgery would require a pretty extensive patch with some careful review (particularly along the lines of object lifecycle).
Barring that extensive of a change, I'm wondering how we are getting into this code if the token isn't even present. If the token is present, we should have a valid default session. If we don't have a valid default session, we should treat the token as 'not present'. So the question is 1) how are we getting into these functions we effectively non-present tokens, and 2) if the token is present why is the session null. (NOTE: disable should imply not present).
bob
Assignee | ||
Comment 83•16 years ago
|
||
Comment on attachment 334034 [details] [diff] [review]
patch for remaining functions in trustdomain.c (checked in)
Bob, please review.
I'd like to get the patches for bug 200704,
bug 444974 and this bug all into NSS 3.12.next
Attachment #334034 -
Flags: review?(rrelyea)
Updated•16 years ago
|
Attachment #334034 -
Flags: review?(rrelyea) → review+
Assignee | ||
Comment 84•16 years ago
|
||
Comment on attachment 334034 [details] [diff] [review]
patch for remaining functions in trustdomain.c (checked in)
lib/pki/trustdomain.c; new revision: 1.60; previous revision: 1.59
Attachment #334034 -
Attachment description: patch for remaining functions in trustdomain.c → patch for remaining functions in trustdomain.c (checked in)
Assignee | ||
Comment 85•16 years ago
|
||
I think this is all fixed on the trunk. The remaining task is to back port
this work to the 3.11 branch. I will attempt to do that soon (probably at
least a day away though).
Assignee | ||
Comment 86•16 years ago
|
||
This patch for the 3.11. branch back ports changes that have already been
committed on the trunk for this and related bugs. It includes:
- the final patch for the trunk in this bug, for file trustdomain.c
- the patch for bug 444974 (which bug was revealed when the patches for
this bug were committed on the trunk)
- the patch for bug 431929
- a few miscellaneous changes (all null pointer detections), all of which
are on the trunk now, but not in 3.11, and
- the difference between patch v5 and patch v6 of the patch for this bug
which was named "create token object, even when there is no session".
It seems that the version checked in on the branch was v5. This patch
includes the diffs between v5 and v6, all in file dev/devslot.c.
Since all these changes have already been reviewed for the trunk, I'm just going to ask for one additional review for this patch.
Attachment #344595 -
Flags: review?
Assignee | ||
Updated•16 years ago
|
Attachment #344595 -
Flags: review? → review?(julien.pierre.boogz)
Assignee | ||
Comment 87•16 years ago
|
||
Comment on attachment 344595 [details] [diff] [review]
roll-up patch for 3.11.10 (checked in on branch)
The major components of this patch are parts of
other patches, from this bug and others.
The include:
This bug
"patch for remaining functions in trustdomain.c"
https://bugzilla.mozilla.org/attachment.cgi?id=334034
Bug 444974
"Stop assuming session pointers are non-NULL"
https://bugzilla.mozilla.org/attachment.cgi?id=339706
This bug, the difference between v5 and v6 of the patch
"create token object, even when there is no session, for trunk"
https://bugzilla.mozilla.org/attachment.cgi?id=331456 (v5)
https://bugzilla.mozilla.org/attachment.cgi?id=331699 (v6)
This difference is the entire change for dev/devslot.c
in this patch.
Assignee | ||
Comment 88•16 years ago
|
||
For bug 431929, the patch "Refactoring patch" is
https://bugzilla.mozilla.org/attachment.cgi?id=321344
Assignee | ||
Comment 89•16 years ago
|
||
Comment on attachment 344595 [details] [diff] [review]
roll-up patch for 3.11.10 (checked in on branch)
Looking for a reviewer for this patch, which back ports fixes from trunk to branch.
This is holding up NSS 3.11.10, and Julien has been given other tasks of higher priority.
Attachment #344595 -
Flags: superreview?(wtc)
Attachment #344595 -
Flags: review?(rrelyea)
Updated•16 years ago
|
Attachment #344595 -
Flags: review?(rrelyea) → review+
Comment 90•16 years ago
|
||
Comment on attachment 344595 [details] [diff] [review]
roll-up patch for 3.11.10 (checked in on branch)
r+ this is quite a rollup bug. (it may have been better to review the separate component parts as there are several logical changes).
Most changes introduced no new semantic, only rearraging of the code.
The rest did the following:
1) added more fault tolerance to the code, detecting failures in the session.
2) in the CRL lookup, we now no longer fail if we have a failure in one token.
Both changes appear to be desirable and correct.
bob
Assignee | ||
Comment 91•16 years ago
|
||
Comment on attachment 344595 [details] [diff] [review]
roll-up patch for 3.11.10 (checked in on branch)
seeking another review for branch
Attachment #344595 -
Flags: review?(alexei.volkov.bugs)
Updated•16 years ago
|
Attachment #344595 -
Flags: superreview?(wtc)
Comment 92•16 years ago
|
||
Nelson,
Re: attachment 344595 [details] [diff] [review], the change in devslot.c doesn't seem to correspond to any of the other patches you referenced. That change is not on the trunk also and probably should be, if it's going to the branch.
Assignee | ||
Comment 93•16 years ago
|
||
Re comment 92, yes, I mentioned that issue with devslot.c in comment 86.
I had planned to check it in when when I checkin the fix for the 3.11 branch.
Comment 94•16 years ago
|
||
Nelson,
Re: comment 93 and 87, I missed that. So many patches.
The description of attachment 331699 [details] [diff] [review] says "create token object, even when there is no session, for trunk, v6 (checked in)". I guess that's incorrect if only v5 was checked in. Since the v6 patch got 2 reviews, the remainder should be checked in to the trunk.
Because there are so many patches involved here, I'm first verifying that the roll-up patch is the sum of the others, modulo the changes you listed, and then I will review each patch individually.
Comment 95•16 years ago
|
||
Comment on attachment 344595 [details] [diff] [review]
roll-up patch for 3.11.10 (checked in on branch)
Alexei has taken over this review. Taking myself off.
Attachment #344595 -
Flags: review?(julien.pierre.boogz)
Comment 96•16 years ago
|
||
Comment on attachment 344595 [details] [diff] [review]
roll-up patch for 3.11.10 (checked in on branch)
Patch looks good r+.
Two things: it partially includes Julien's patch for bug 232392. I guess we should not forget to integrate the rest of the patch into the branch later.
Another thing is how we create and destroy nssToken default session. It always assumed that the session will be created on the other object(not token) arena. But, for the constructor(nssSession_ImportNSS3Session function http://mxr.mozilla.org/security/ident?i=nssSession_ImportNSS3Session)
arena is optional. Token destructor does not do any thing to session, thinking that it belongs to another object arena. So my question is where a possibility, that nssToken session will be created without use of arena?
Attachment #344595 -
Flags: review?(alexei.volkov.bugs) → review+
Assignee | ||
Comment 97•16 years ago
|
||
Comment on attachment 331699 [details] [diff] [review]
create token object, even when there is no session, for trunk, v6 (checked in)
Previously I had accidentally checked in version 5 of this patch,
for file devslot.c, even though version 6 was the reviewed version.
Now I have corrected that by committing the diffs.
nss/lib/dev/devslot.c; new revision: 1.25; previous revision: 1.24
Comment 98•16 years ago
|
||
This bug is the last pending fix for NSS 3.11.10 (AFAIK). Is there anything blocking the resolution of this bug ?
Assignee | ||
Comment 99•16 years ago
|
||
Comment on attachment 344595 [details] [diff] [review]
roll-up patch for 3.11.10 (checked in on branch)
Committed on branch
dev/ckhelper.c; new: 1.34.28.2; previous: 1.34.28.1
dev/dev.h; new: 1.37.28.2; previous: 1.37.28.1
dev/devslot.c; new: 1.22.2.3; previous: 1.22.2.2
dev/devtoken.c; new: 1.39.28.4; previous: 1.39.28.3
dev/devutil.c; new: 1.26.28.2; previous: 1.26.28.1
pk11wrap/dev3hack.c; new: 1.21.28.3; previous: 1.21.28.2
pk11wrap/pk11cert.c; new: 1.143.2.14; previous: 1.143.2.13
pki/trustdomain.c; new: 1.51.28.8; previous: 1.51.28.7
Attachment #344595 -
Attachment description: roll-up patch for 3.11.10 → roll-up patch for 3.11.10 (checked in on branch)
Assignee | ||
Comment 100•16 years ago
|
||
with patches committed on trunk and branch, I am optimistically marking
this bug resolved/fixed.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•