PK11_FindCrlByName returns invalid SECItem when running low on memory

RESOLVED FIXED in 3.8

Status

NSS
Libraries
P1
normal
RESOLVED FIXED
16 years ago
15 years ago

People

(Reporter: Julien Pierre, Assigned: Robert Relyea)

Tracking

Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 2 obsolete attachments)

(Reporter)

Description

16 years ago
I'm doing some work from home on my OS/2 machine this week-end, so I used the
same large cert7.db database that holds my 26 MB CRL. Much to my surprise, NSS
is unable to correctly lookup the CRL when verifying a certificate using
certutil -V.

Further investigation shows that PK11_FindCrlByName returns a non-NULL SECItem*,
with a data field of NULL, and a length of NULL. However, it also returns a
non-zero CK_OBJECT_HANDLE, indicating that it found a CRL.

I tried this on an NSS 3.5 tree and the same problem occurred. So this is not
related to the recent memory optimizations in 3.6.

When I do the same thing on Solaris, both in 3.5 and 3.6, there is no problem,
and the CRL is found. So it seems that this bug only shows on OS/2 so This might
be yet another DBM issue. I will continue investigating.
(Reporter)

Comment 1

16 years ago
FYI, PK11_LookupCrls, used for crlutil -L, works fine. It's only the lookup of
the CRL by subject that fails.

I traced the code some more and see that it's not DBM causing the problem - it
reads it fine. Somehow the object is getting lost when being returned to the
higher-level functions. This may be due to the excessive number of copies we do.
Despite being 32-bit, OS/2 has a per-process address space limit of 512MB. There
are other factors that reduce this limit, such as shared memory in use in the
OS. It is not impossible that I have hit this memory addressing limit in my
test, but I don't know for sure yet.

In fact, I just confirmed that this is the problem. The process was using more
than 300 MB of memory for some reason, and some calls to SECItem_DupItem were
returning NULL, and eventually this is what got returned to the top.

I disabled the arena free list by setting NSS_DISABLE_ARENA_FREE_LIST=1. Then
the memory usage of the process was much lower, and the CRL lookup succeeded !!!

Now for the interesting part about how I discovered this : I have a test that
checks for revocation status on a certificate listed on the CRL. On OS/2, the
test was returning that the certificate was valid !!!

What happened is that after getting the invalid SECItem, the higher-level code
tried to decode it. That failed. So it was unable to build a CERTSignedCrl*, and
returned NULL. Then SEC_CheckCrl believed that there was no CRL for the issuer,
and proceeded happily to validate the certificate.
Priority: -- → P1
Target Milestone: --- → 3.6
(Reporter)

Updated

16 years ago
OS: OS/2 → All
Hardware: PC → All
Summary: PK11_FindCrlByName returns invalid SECItem on OS/2 → PK11_FindCrlByName returns invalid SECItem when running low of memory
(Reporter)

Updated

16 years ago
Depends on: 160635
(Reporter)

Updated

16 years ago
Blocks: 162983
Created attachment 96752 [details] [diff] [review]
patch NSPR's plarena.c to only put blocks <= 100K on free list

This patch changes static function FreeArenaList in nsprpub/lib/ds/plarena.c
to free any arena block > 100KB rather than putting it on the free arena list.
This should provide some immediate reduction in memory usage when giant things
get allocated in an arena pool.

This patch also causes the blocks that are put on the free list to be cleared
and causes the pool's counter of arena blocks to be reduced by the number of
blocks moved to the free list.	  Review invited.  (I haven't tested this yet.)


I will also produce a patch that makes the arena free list be sorted in
ascending order of free block size.  This should make the first-fit allocation
algorithm perform as a best fit one.
Created attachment 96763 [details] [diff] [review]
patch creates 4 freelists, sorted by size

This patch is an alternative to the previous one.  It keeps the freelist 
ordered by increasing arena block size, which causes the first-fit allocation
scheme to be a best-fit allocation scheme.  To shorten the time spent searching
the freelist for insertion or removal of a fitting block, the freelist is
broken into 4 lists, all protected by the same single lock.  The 4 lists are
used for different sizes of blocks.  The ranges of sizes for the 4 lists are:
    0k < x < 2k
    2k < x < 8k
    8k < x < 32k
   32k < x
comments invited.  Code is untested.
Oops. That last comment should have said:

    0k < x <= 2k
    2k < x <= 8k
    8k < x <= 32k
   32k < x 
(Reporter)

Comment 5

16 years ago
Nelson,

I can't compile your patch. I think you made it against a different version of
NSPR than the one I'm using.
I am using the tip of NSPR.

I tried to rev plarena.c to release 3.6.2.1 which you patched, but I still get a
compile error :

[e:\dev\nss\36\mozilla\nsprpub\os22.45_icc_dbg.obj\lib\ds]gmake
icc -Foplarena.obj -c   /Q /W1 /Q /qlibansi /Gd+ /Gm+ /Su4 /Mp /Tl9 /Ti+    -UND
EBUG -DDEBUG_charm   -DDEBUG=1  -DXP_OS2=1  -DXP_PC=1  -DBSD_SELECT=1  -D_PR_GLO
BAL_THREADS_ONLY=1  -DXP_OS2_VACPP=1  -DOS2=4  -DTCPV40HDRS=1  -D_X86_=1  -DHAVE
_STRERROR=1   -DFORCE_PR_LOG -IE:/DEV/NSS/36/mozilla/security/nss/../../dist/OS2
2.45_icc_DBG.OBJ/include -I../../../pr/include  ../../../lib/ds/plarena.c
../../../lib/ds/plarena.c(306:38) : error EDC0068: Operation between types "unsi
gned long" and "struct PLArena*" is not allowed.
../../../lib/ds/plarena.c(306:42) : error EDC0045: Undeclared identifier base.
E:/DEV/MOZTOOLS/GMAKE.EXE: *** [plarena.obj] Error 12

[e:\dev\nss\36\mozilla\nsprpub\os22.45_icc_dbg.obj\lib\ds]

Please give me a patch for the NSPR tip.

I have a good test case to reproduce this problem - my cert verification test
program runs out of memory when the NSS arena free list is enabled, because it
hits the OS/2 512 MB process limit. It only runs successfully with the free list
disabled. So if I could get the test to work with your patch and the free list
enabled, that would be good progress.

I feel that the arena enhancement should probably be the subject of another bug
though, on NSPR, so feel free to open one.

In this bug, I will track down the place where we try to allocate memory and it
fails without us catching it.
(Reporter)

Comment 6

16 years ago
The problem occurs in softoken/pcertdb.c, line 4824 .
The SECITEM_DupItem returns NULL. This error is ignored, and the function
returns SECSuccess .
(Reporter)

Comment 7

16 years ago
Unfortunately, even checking for this condition and returning SECFailure doesn't
fix the problem.
This is a design issue - this API wasn't designed to support failures.
It was supposed to either return objects or no objects, but no in between, such
as "not enough memory to return the object that is present" which is this case.
So we get stuck with bogus data being returned.

However, this is an internal API, so it could be fixed with a better prototype,
as well additional checking in some of the layers. I'm afraid the stack is
rather big when this happens, though :

 Function                          | Part            
-----------------------------------+-----------------
 nsslowcert_FindCrlByKey           | PCERTDB.OBJ     
 pk11_getCrl                       | PKCS11U.OBJ     
 pk11_FindCrlAttribute             | PKCS11U.OBJ     
 pk11_FindTokenAttribute           | PKCS11U.OBJ     
 pk11_FindAttribute                | PKCS11U.OBJ     
 NSC_GetAttributeValue             | PKCS11.OBJ      
 nssCKObject_GetAttributes         | CKHELPER.OBJ    
 nssCryptokiCRL_GetAttributes      | CKHELPER.OBJ    
 nssCRL_Create                     | CERTIFICATE.OBJ 
 crl_createObject                  | PKIBASE.OBJ     
 nssPKIObjectCollection_GetObjects | PKIBASE.OBJ     
 nssPKIObjectCollection_GetCRLs    | PKIBASE.OBJ     
 nssTrustDomain_FindCRLsBySubject  | TRUSTDOMAIN.OBJ 
 PK11_FindCrlByName                | PK11CERT.OBJ    
 SEC_FindCrlByKeyOnSlot            | CRL.OBJ         
 DPCache_Fetch                     | CRL.OBJ         
 DPCache_Update                    | CRL.OBJ         
 AcquireDPCache                    | CRL.OBJ         
 CERT_CheckCRL                     | CRL.OBJ         
 SEC_CheckCRL                      | CERTVFY.OBJ     
 cert_VerifyCertChain              | CERTVFY.OBJ     
 CERT_VerifyCertificate            | CERTVFY.OBJ     
 VerifyCert                        | CERTUTIL.OBJ    
 ValidateCert                      | CERTUTIL.OBJ    
 main                              | CERTUTIL.OBJ    
 _start                            | EXESTRTI        
 0x1FFECE33                        | DOSCALL1.DLL:4  
(Assignee)

Comment 8

16 years ago
The problem is that nssCKObject_GetAttributes() has to handle the case where we
may be asking for attributes that are not critical and may not be understood by
the higher level. This means the CRL that gets created has a NULL value.

Somewhere this value needs to be checked for validity, and if that check fails
we need to fail a crl validiation.

Changing the error code coming back from softoken won't affect this.
Comment on attachment 96763 [details] [diff] [review]
patch creates 4 freelists, sorted by size

This patch contains a syntax error.  The replacement patch will be added to bug
166701
Attachment #96763 - Attachment is obsolete: true
Attachment #96763 - Flags: needs-work+

Comment 10

16 years ago
Assigned the bug to Bob.
Assignee: wtc → relyea
(Assignee)

Comment 11

16 years ago
Created attachment 101207 [details] [diff] [review]
patch to surface errors in lookups up.

The basic idea of this patch is that error types are sent up using
PORT_SetError(). In our search routine, if we simply don't find what we are
looking for, we set the error NOT_FOUND. Upper level search codes, when no CRL
is found check for this error code and treats it as a no error condition. This
allows the CRL data to fail for other reasons.

This patch does fail cert validation if the C_FindInit* calls return an error,
or the object is found, but an error is found in subsequent processing.

Our standard regression tests pass with this code.

This patch deals with the Cert lookup and Cert Validations paths, so it is
deemed high risk. It should be tested against Julian's tests, and ran in a
couple of existing products before it goes into RTM code. I recommend moving
this bug to NSS 3.7.
(Assignee)

Comment 12

16 years ago
since the patch is riskier than the current behavior, moving out to 3.7.
Target Milestone: 3.6 → 3.7
(Assignee)

Comment 13

16 years ago
Fixed checked into NSS 3.7. I'd like Julien to verify against his test case
before I close this, though.
(Reporter)

Comment 14

16 years ago
Actually, I ran this with my test case yesterday. The test was the verification
of a revoked certificate.

Prior to this fix, the code would return an invalid SECItem, then the decoding
would fail, then the CRL code would do the same thing and assume revoked by default.

Now, the wrapper code returns a NULL SECItem, so the CRL code assumes there is
no CRL, and therefore the CRL validates.

I have adjusted the test case by adding a malloc() in NSSInit on my tree that
allocates RAM based on the environment variable "LEAK". This eats into OS/2's
limited process address space (about 300 MB actual due to shared memory, out of
the 512 MB theoritical). If I set LEAK to 10 MB, then the cert is considered
revoked because the CRL can be read/decoded.

If I set LEAK to 100 MB, there isn't enough memory, and no CRL is found, and the
cert actually verifies ...

We need some way of returning something other than a good or a NULL CRL. This is
why I was saying the API was broken. There is no way to report a failure ...
(Reporter)

Comment 15

16 years ago
In the previous comment, in place of "same thing" read "safe thing".

Also, it's the cert that validates, not the CRL.
(Reporter)

Comment 16

16 years ago
Sorry, I think I tested on the 3.6 branch, which doesn't have the fix. I am
building the tip now.
(Reporter)

Comment 17

16 years ago
Even with the tip, it still fails.
I found that the code doesn't go through the code path that Bob thought.
Even though the last error set is indeed NSS_ERROR_NO_MEMORY, what happens is
that the error is never checked.

Inside of PK11_FindCrlByName, at line 3704, a non-NULL array is returned.
This causes the NSS_GetError() call that Bob added to be skipped.

However, the content of the array is empty, so the code goes on to the for loop
, which checks for interesting CRLs in the array. There are none, so it just
returns SEC_ERROR_CRL_NOT_FOUND.

Suggestions :
- if the array is completely empty, then perhaps this should be a fatal error
and we should not set "not found" ? I'll create a patch for this
- we probably should not be returning an empty array
(Reporter)

Updated

16 years ago
Summary: PK11_FindCrlByName returns invalid SECItem when running low of memory → PK11_FindCrlByName returns invalid SECItem when running low on memory
(Reporter)

Comment 18

16 years ago
Created attachment 102646 [details] [diff] [review]
patch to consider an empty array an error and not set "not found" code

This causes the correct behavior - the certs (both revoked and valid) are
considered invalid when a CRL cannot be successfully transferred from softoken
due to a low memory condition.
However, the error code reported at the highest level from
CERT_VerifyCertificate is "invalid arguments". The fix for that may not
necessarily be in the same code that was just patched but higher up.
(Reporter)

Comment 19

16 years ago
Bob,

Could you please review my latest (patch 102646) ? I think this should go in the
tip and perhaps also in 3.6.1 .

Thanks.
(Assignee)

Comment 20

16 years ago
That patch would work, though a simpler patch for the same effect would be to
change line 3726

from:

> if (!crls) {

to:

> if (!crls || *crls == NULL) {
>     if (crls) nssCrlArray_Destroy(crls);

bob

Comment 21

16 years ago
Comment on attachment 102646 [details] [diff] [review]
patch to consider an empty array an error and not set "not found" code

I just wanted to ask that you use the prevalent
comment block formatting style in the file.
(Reporter)

Comment 22

16 years ago
Created attachment 104069 [details] [diff] [review]
fix comments

Fix comment style
Attachment #102646 - Attachment is obsolete: true

Comment 23

15 years ago
Moved to target milestone 3.8 because the original
NSS 3.7 release has been renamed 3.8.
Target Milestone: 3.7 → 3.8
(Assignee)

Comment 24

15 years ago
The shorter version of the patch i proposed in the comment is now checked in to
the trunk.
Status: NEW → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.