Closed Bug 164501 Opened 22 years ago Closed 22 years ago

PK11_FindCrlByName returns invalid SECItem when running low on memory

Categories

(NSS :: Libraries, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: julien.pierre, Assigned: rrelyea)

References

Details

Attachments

(3 files, 2 obsolete files)

I'm doing some work from home on my OS/2 machine this week-end, so I used the
same large cert7.db database that holds my 26 MB CRL. Much to my surprise, NSS
is unable to correctly lookup the CRL when verifying a certificate using
certutil -V.

Further investigation shows that PK11_FindCrlByName returns a non-NULL SECItem*,
with a data field of NULL, and a length of NULL. However, it also returns a
non-zero CK_OBJECT_HANDLE, indicating that it found a CRL.

I tried this on an NSS 3.5 tree and the same problem occurred. So this is not
related to the recent memory optimizations in 3.6.

When I do the same thing on Solaris, both in 3.5 and 3.6, there is no problem,
and the CRL is found. So it seems that this bug only shows on OS/2 so This might
be yet another DBM issue. I will continue investigating.
FYI, PK11_LookupCrls, used for crlutil -L, works fine. It's only the lookup of
the CRL by subject that fails.

I traced the code some more and see that it's not DBM causing the problem - it
reads it fine. Somehow the object is getting lost when being returned to the
higher-level functions. This may be due to the excessive number of copies we do.
Despite being 32-bit, OS/2 has a per-process address space limit of 512MB. There
are other factors that reduce this limit, such as shared memory in use in the
OS. It is not impossible that I have hit this memory addressing limit in my
test, but I don't know for sure yet.

In fact, I just confirmed that this is the problem. The process was using more
than 300 MB of memory for some reason, and some calls to SECItem_DupItem were
returning NULL, and eventually this is what got returned to the top.

I disabled the arena free list by setting NSS_DISABLE_ARENA_FREE_LIST=1. Then
the memory usage of the process was much lower, and the CRL lookup succeeded !!!

Now for the interesting part about how I discovered this : I have a test that
checks for revocation status on a certificate listed on the CRL. On OS/2, the
test was returning that the certificate was valid !!!

What happened is that after getting the invalid SECItem, the higher-level code
tried to decode it. That failed. So it was unable to build a CERTSignedCrl*, and
returned NULL. Then SEC_CheckCrl believed that there was no CRL for the issuer,
and proceeded happily to validate the certificate.
Priority: -- → P1
Target Milestone: --- → 3.6
OS: OS/2 → All
Hardware: PC → All
Summary: PK11_FindCrlByName returns invalid SECItem on OS/2 → PK11_FindCrlByName returns invalid SECItem when running low of memory
Depends on: 160635
Blocks: 162983
This patch changes static function FreeArenaList in nsprpub/lib/ds/plarena.c
to free any arena block > 100KB rather than putting it on the free arena list.
This should provide some immediate reduction in memory usage when giant things
get allocated in an arena pool.

This patch also causes the blocks that are put on the free list to be cleared
and causes the pool's counter of arena blocks to be reduced by the number of
blocks moved to the free list.	  Review invited.  (I haven't tested this yet.)


I will also produce a patch that makes the arena free list be sorted in
ascending order of free block size.  This should make the first-fit allocation
algorithm perform as a best fit one.
This patch is an alternative to the previous one.  It keeps the freelist 
ordered by increasing arena block size, which causes the first-fit allocation
scheme to be a best-fit allocation scheme.  To shorten the time spent searching
the freelist for insertion or removal of a fitting block, the freelist is
broken into 4 lists, all protected by the same single lock.  The 4 lists are
used for different sizes of blocks.  The ranges of sizes for the 4 lists are:
    0k < x < 2k
    2k < x < 8k
    8k < x < 32k
   32k < x
comments invited.  Code is untested.
Oops. That last comment should have said:

    0k < x <= 2k
    2k < x <= 8k
    8k < x <= 32k
   32k < x 
Nelson,

I can't compile your patch. I think you made it against a different version of
NSPR than the one I'm using.
I am using the tip of NSPR.

I tried to rev plarena.c to release 3.6.2.1 which you patched, but I still get a
compile error :

[e:\dev\nss\36\mozilla\nsprpub\os22.45_icc_dbg.obj\lib\ds]gmake
icc -Foplarena.obj -c   /Q /W1 /Q /qlibansi /Gd+ /Gm+ /Su4 /Mp /Tl9 /Ti+    -UND
EBUG -DDEBUG_charm   -DDEBUG=1  -DXP_OS2=1  -DXP_PC=1  -DBSD_SELECT=1  -D_PR_GLO
BAL_THREADS_ONLY=1  -DXP_OS2_VACPP=1  -DOS2=4  -DTCPV40HDRS=1  -D_X86_=1  -DHAVE
_STRERROR=1   -DFORCE_PR_LOG -IE:/DEV/NSS/36/mozilla/security/nss/../../dist/OS2
2.45_icc_DBG.OBJ/include -I../../../pr/include  ../../../lib/ds/plarena.c
../../../lib/ds/plarena.c(306:38) : error EDC0068: Operation between types "unsi
gned long" and "struct PLArena*" is not allowed.
../../../lib/ds/plarena.c(306:42) : error EDC0045: Undeclared identifier base.
E:/DEV/MOZTOOLS/GMAKE.EXE: *** [plarena.obj] Error 12

[e:\dev\nss\36\mozilla\nsprpub\os22.45_icc_dbg.obj\lib\ds]

Please give me a patch for the NSPR tip.

I have a good test case to reproduce this problem - my cert verification test
program runs out of memory when the NSS arena free list is enabled, because it
hits the OS/2 512 MB process limit. It only runs successfully with the free list
disabled. So if I could get the test to work with your patch and the free list
enabled, that would be good progress.

I feel that the arena enhancement should probably be the subject of another bug
though, on NSPR, so feel free to open one.

In this bug, I will track down the place where we try to allocate memory and it
fails without us catching it.
The problem occurs in softoken/pcertdb.c, line 4824 .
The SECITEM_DupItem returns NULL. This error is ignored, and the function
returns SECSuccess .
Unfortunately, even checking for this condition and returning SECFailure doesn't
fix the problem.
This is a design issue - this API wasn't designed to support failures.
It was supposed to either return objects or no objects, but no in between, such
as "not enough memory to return the object that is present" which is this case.
So we get stuck with bogus data being returned.

However, this is an internal API, so it could be fixed with a better prototype,
as well additional checking in some of the layers. I'm afraid the stack is
rather big when this happens, though :

 Function                          | Part            
-----------------------------------+-----------------
 nsslowcert_FindCrlByKey           | PCERTDB.OBJ     
 pk11_getCrl                       | PKCS11U.OBJ     
 pk11_FindCrlAttribute             | PKCS11U.OBJ     
 pk11_FindTokenAttribute           | PKCS11U.OBJ     
 pk11_FindAttribute                | PKCS11U.OBJ     
 NSC_GetAttributeValue             | PKCS11.OBJ      
 nssCKObject_GetAttributes         | CKHELPER.OBJ    
 nssCryptokiCRL_GetAttributes      | CKHELPER.OBJ    
 nssCRL_Create                     | CERTIFICATE.OBJ 
 crl_createObject                  | PKIBASE.OBJ     
 nssPKIObjectCollection_GetObjects | PKIBASE.OBJ     
 nssPKIObjectCollection_GetCRLs    | PKIBASE.OBJ     
 nssTrustDomain_FindCRLsBySubject  | TRUSTDOMAIN.OBJ 
 PK11_FindCrlByName                | PK11CERT.OBJ    
 SEC_FindCrlByKeyOnSlot            | CRL.OBJ         
 DPCache_Fetch                     | CRL.OBJ         
 DPCache_Update                    | CRL.OBJ         
 AcquireDPCache                    | CRL.OBJ         
 CERT_CheckCRL                     | CRL.OBJ         
 SEC_CheckCRL                      | CERTVFY.OBJ     
 cert_VerifyCertChain              | CERTVFY.OBJ     
 CERT_VerifyCertificate            | CERTVFY.OBJ     
 VerifyCert                        | CERTUTIL.OBJ    
 ValidateCert                      | CERTUTIL.OBJ    
 main                              | CERTUTIL.OBJ    
 _start                            | EXESTRTI        
 0x1FFECE33                        | DOSCALL1.DLL:4  
The problem is that nssCKObject_GetAttributes() has to handle the case where we
may be asking for attributes that are not critical and may not be understood by
the higher level. This means the CRL that gets created has a NULL value.

Somewhere this value needs to be checked for validity, and if that check fails
we need to fail a crl validiation.

Changing the error code coming back from softoken won't affect this.
Comment on attachment 96763 [details] [diff] [review]
patch creates 4 freelists, sorted by size

This patch contains a syntax error.  The replacement patch will be added to bug
166701
Attachment #96763 - Attachment is obsolete: true
Attachment #96763 - Flags: needs-work+
Assigned the bug to Bob.
Assignee: wtc → relyea
The basic idea of this patch is that error types are sent up using
PORT_SetError(). In our search routine, if we simply don't find what we are
looking for, we set the error NOT_FOUND. Upper level search codes, when no CRL
is found check for this error code and treats it as a no error condition. This
allows the CRL data to fail for other reasons.

This patch does fail cert validation if the C_FindInit* calls return an error,
or the object is found, but an error is found in subsequent processing.

Our standard regression tests pass with this code.

This patch deals with the Cert lookup and Cert Validations paths, so it is
deemed high risk. It should be tested against Julian's tests, and ran in a
couple of existing products before it goes into RTM code. I recommend moving
this bug to NSS 3.7.
since the patch is riskier than the current behavior, moving out to 3.7.
Target Milestone: 3.6 → 3.7
Fixed checked into NSS 3.7. I'd like Julien to verify against his test case
before I close this, though.
Actually, I ran this with my test case yesterday. The test was the verification
of a revoked certificate.

Prior to this fix, the code would return an invalid SECItem, then the decoding
would fail, then the CRL code would do the same thing and assume revoked by default.

Now, the wrapper code returns a NULL SECItem, so the CRL code assumes there is
no CRL, and therefore the CRL validates.

I have adjusted the test case by adding a malloc() in NSSInit on my tree that
allocates RAM based on the environment variable "LEAK". This eats into OS/2's
limited process address space (about 300 MB actual due to shared memory, out of
the 512 MB theoritical). If I set LEAK to 10 MB, then the cert is considered
revoked because the CRL can be read/decoded.

If I set LEAK to 100 MB, there isn't enough memory, and no CRL is found, and the
cert actually verifies ...

We need some way of returning something other than a good or a NULL CRL. This is
why I was saying the API was broken. There is no way to report a failure ...
In the previous comment, in place of "same thing" read "safe thing".

Also, it's the cert that validates, not the CRL.
Sorry, I think I tested on the 3.6 branch, which doesn't have the fix. I am
building the tip now.
Even with the tip, it still fails.
I found that the code doesn't go through the code path that Bob thought.
Even though the last error set is indeed NSS_ERROR_NO_MEMORY, what happens is
that the error is never checked.

Inside of PK11_FindCrlByName, at line 3704, a non-NULL array is returned.
This causes the NSS_GetError() call that Bob added to be skipped.

However, the content of the array is empty, so the code goes on to the for loop
, which checks for interesting CRLs in the array. There are none, so it just
returns SEC_ERROR_CRL_NOT_FOUND.

Suggestions :
- if the array is completely empty, then perhaps this should be a fatal error
and we should not set "not found" ? I'll create a patch for this
- we probably should not be returning an empty array
Summary: PK11_FindCrlByName returns invalid SECItem when running low of memory → PK11_FindCrlByName returns invalid SECItem when running low on memory
This causes the correct behavior - the certs (both revoked and valid) are
considered invalid when a CRL cannot be successfully transferred from softoken
due to a low memory condition.
However, the error code reported at the highest level from
CERT_VerifyCertificate is "invalid arguments". The fix for that may not
necessarily be in the same code that was just patched but higher up.
Bob,

Could you please review my latest (patch 102646) ? I think this should go in the
tip and perhaps also in 3.6.1 .

Thanks.
That patch would work, though a simpler patch for the same effect would be to
change line 3726

from:

> if (!crls) {

to:

> if (!crls || *crls == NULL) {
>     if (crls) nssCrlArray_Destroy(crls);

bob
Comment on attachment 102646 [details] [diff] [review]
patch to consider an empty array an error and not set "not found" code

I just wanted to ask that you use the prevalent
comment block formatting style in the file.
Attached patch fix commentsSplinter Review
Fix comment style
Attachment #102646 - Attachment is obsolete: true
Moved to target milestone 3.8 because the original
NSS 3.7 release has been renamed 3.8.
Target Milestone: 3.7 → 3.8
The shorter version of the patch i proposed in the comment is now checked in to
the trunk.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: