Closed Bug 397832 Opened 17 years ago Closed 16 years ago

libpkix leaks memory if a macro calls a function that returns an error

Categories

(NSS :: Libraries, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: alvolkov.bgs, Assigned: alvolkov.bgs)

References

Details

(Keywords: memory-leak, Whiteboard: PKIX)

Attachments

(14 files, 17 obsolete files)

14.50 KB, patch
nelson
: review+
Details | Diff | Splinter Review
12.95 KB, patch
nelson
: review+
Details | Diff | Splinter Review
80.23 KB, patch
nelson
: review+
Details | Diff | Splinter Review
2.71 KB, patch
nelson
: review+
Details | Diff | Splinter Review
1.33 KB, patch
nelson
: review+
Details | Diff | Splinter Review
31.15 KB, patch
nelson
: review+
Details | Diff | Splinter Review
39.36 KB, patch
nelson
: review+
Details | Diff | Splinter Review
1.05 KB, patch
nelson
: review+
Details | Diff | Splinter Review
47.04 KB, patch
nelson
: review+
Details | Diff | Splinter Review
2.02 KB, patch
nelson
: review+
Details | Diff | Splinter Review
4.96 KB, patch
nelson
: review+
Details | Diff | Splinter Review
758 bytes, patch
nelson
: review+
Details | Diff | Splinter Review
2.10 KB, patch
nelson
: review-
Details | Diff | Splinter Review
3.90 KB, patch
nelson
: review-
Details | Diff | Splinter Review
PKIX_CHECK_ONLY_FATAL macro is defined in a way so that it immediately abort execution and returns received error if executed by this macro function returned a fatal error.

The macro should be fixed to at least have "goto cleanup" instruction.
Whiteboard: PKIX
Priority: -- → P2
The problem is bigger than just the PKIX_CHECK_ONLY_FATAL macro, and 
going to cleanup will not fix the leaks (by itself).  

Consider, for example, the case in function pkix_Build_InitiateBuildChain(
(which code may have been moved to function pkix_Build_CheckInCache by the
time you read this).  If function pkix_Build_ValidateEntireChain returns 
a FATAL error, that code will potentially leak 4 objects because it will 
bypass the following 4 calls:

        PKIX_DECREF(state->reversedCertChain);
        PKIX_DECREF(state->checkedCritExtOIDs);
        PKIX_DECREF(state->checkerChain);
        PKIX_DECREF(state->revCheckers);

Changing the macro to go to cleanup will not fix those leaks.

Perhaps we should make sure that no libPKIX function ever outputs an
object reference to its caller, and THEN gives a "fatal" error code.
The bug should be renamed to become more general. The reason - most of the macros defined by libpkix check the returned value(which is pointer to PKIX_Error in most of the cases) from commands that they called. The macro immediately call "return" from a caller function if the returned value is not equal to NULL. 

One of the example if PKIX_DECREF: 

 #define PKIX_DECREF(obj) \
    do { \
	if (obj){ \
	    pkixTempResult = PKIX_PL_Object_DecRef \
			((PKIX_PL_Object *)(obj), plContext); \
	    if (pkixTempResult) \
		return pkixTempResult; \
	    obj = NULL; \
	} \
    } while (0)

It check for pkixTempResult and return if it is NULL. If we consider Nelson's example in the comment 1, we will leak 3 out of four objects if first PKIX_DECREF failed. 
Summary: libpkix leaks memory if PKIX_CHECK_ONLY_FATAL is used and called a function that returned an error → libpkix leaks memory if a macro calls a function that returns an error
return in macros except in very local (function-local macro) cases is a botch. What would happen if you first purged all return-in-macro uses?

/be
Flags: blocking1.9?
Brendan,
There was apparently a plan in which Return in macros was not a botch, but 
other aspects of the implementation were botched.  The design was that all 
the allocations done in the libPKIX code for a single cert path validation 
would come from an arenapool that would be allocated at the beginning of 
the path validation operation, and would be discarded at the end of the 
operation.  So the plan was that no memory would actually be leaked.
There is a lot of code in pre-PKIX NSS that works that way, quite 
successfully (without leaks).

But, somewhere between the plan and the coding of libPKIX things went awry.  
Instead of an arenapool for each operation, there is now one global arenapool, 
that is never freed.  

The present plan to resolve this is to fix the code to work as it was 
designed to work, relying on short-lived arenapools to avoid leaks.  
We know that long lived results will need to be copied out of the short-lived
arenapool, or else not allocated from it in the first place.  
Someone is now investigating to see if there are other problems with that.
In other words, if we revert to the original plan, what else will go wrong?
We'll know soon enough.
Propose a plan to overcome error condition memory leaks for discussion:

First of all, I assume that pkix library does not allocate memory from any arena. Arena approach for some libpkix objects can be introduced later without any harm to the project, since the libpkixcode has already a mechanism that will avoid deallocation of a memory that was allocated on arena.

This is going to be a big cleanup for pkix_tools.h and pkix_pl_object.c code, but I think that it will not take too much time to implement this approach.   

1. _FATAL macros shold no longer return from a caller function. The fatal error will be treated as any other errors, but we will make sure that we assert in debug build when such error happens.

2. Other macros from pkix_tools.h will be changed. return statement will be removed from macros. Macro generated "local"  errors will be treated the same way as a general errors in optimized build, and will be asserted in debug builds. To avoid leaking PKIX_Error objects generated by sequentially executed macros, errors will be added to the list of errors that will be available in function context. The first received error will be reported to the upper level functions.

3.  Review all functions that calls _FATAL macros (~40 macro invocations) to check for possible object leakage. These are the only macros that intentionally interrupt a function code execution without any cleanup.

Comments?

I'd' estimate this work to be done within 5 working days.
Alexei, you appear to be saying that you believe that only the "fatal" errors
create any leaks, and that all other non-fatal errors do not leak.  
Are you indeed saying that?  
That is correct. If any other errors occurred, execution will jump to "cleanup" where a function suppose to have proper code object decref/memory deallocation. In fact it is mandatory(you'll get compiler error otherwise) to have "cleanup" label if a function suppose to generate an error condition.
Attached patch Initial patch (obsolete) — Splinter Review
Fixes issues related to memory leaks due to premature return. Fix macros, error handling functions. Biggest impact to pkix_build.c(main chain building functionality) and pkix_tools.c (macros definition).
Assignee: nobody → alexei.volkov.bugs
Status: NEW → ASSIGNED
Attachment #287188 - Flags: review?(nelson)
Comment on attachment 287188 [details] [diff] [review]
Initial patch

OK, here is some preliminary review feedback.

This megapatch is has parts that are separately reviewable.

For many files, the *only* change was to add cleanup: labels
to many functions, and add or remove blank lines.  
r+ for the patches to those files.  Please go ahead and 
commit the changes to those files now (ONLY the files where
the only changes are adding cleanup: and playing with 
blank lines).  Then submit a new patch which is the subset
of this megapatch for the files that you didn't commit.  
I'll continue my review of that new patch, which hopefully
will be a lot smaller.
Attached patch Patch v2. (obsolete) — Splinter Review
Main patch. Changes to files that have only addition of "cleanup" labels are already integrated.
Attachment #287188 - Attachment is obsolete: true
Attachment #287227 - Flags: review?(nelson)
Attachment #287188 - Flags: review?(nelson)
There are three main patterns of the code, that leak objects during error conditions:

1. _Create functions create an object, but do not destruct it in case of error in subsequent code. Example is the code at PKIX_CertChainChecker_Create function. If either of PKIX_INCREF fails, cert chain checker object will be leaked.

2. Use of macros that goto cleanup label in the blocks of code that follows "cleanup" label. Example is usage of PKIX_CHECK
at the end of the function pkix_BuildForwardDepthFirstSearch.

3. Object is lost if the code controlled by PKIX_CHECK macro(for example) executed after a function that create an object but before a reference to an object saved for future references. Example of such code can be found in pkix_build.c:

    after PKIX_PL_AIAMgr_Create is called at line 4034, the created object reference is not saved until line 4102:

    4102             state->buildConstants.aiaMgr = buildConstants.aiaMgr;

Line 4055 where buildConstants.aiaMgr get set to an address in aiaMgr should not be considered since buildConstants is allocated on the stack and will not be destructed.
Depends on: 402727
Comment on attachment 287227 [details] [diff] [review]
Patch v2.

This megapatch contains separable pieces.  I will provide separate reviews
for the pieces.  Some of the pieces will get r+ here, and can be checked
in immediately (please do).  Other parts will get r-, and others await
further review.  Overall, the megapatch gets r-

r+ for the changes to the following files:
+ mozilla/security/nss/lib/libpkix/pkix/crlsel/pkix_crlselector.c
+ lib/libpkix/pkix_pl_nss/module/pkix_pl_ldapcertstore.c  
+ lib/libpkix/pkix_pl_nss/module/pkix_pl_ldapdefaultclient.c
+ lib/libpkix/pkix_pl_nss/module/pkix_pl_pk11certstore.c
+ lib/libpkix/pkix_pl_nss/system/pkix_pl_common.c
The changes to these files can be committed immediately.  Please do.

I will send the rest of the review comments in email.
Attachment #287227 - Flags: review?(nelson) → review-
Attachment #287915 - Flags: review?(nelson)
Attached patch pkix_pl_hashtable.c chages (obsolete) — Splinter Review
Attachment #287227 - Attachment is obsolete: true
Attachment #288067 - Flags: review?(nelson)
Leaks are bad.  +'ing.
Flags: blocking1.9? → blocking1.9+
Attachment #288067 - Attachment is obsolete: true
Attachment #288067 - Flags: review?(nelson)
Previous patch was obsoleted due to problems in the nss build.
Attachment #288082 - Flags: review?(nelson)
Comment on attachment 287915 [details] [diff] [review]
leaks in PKIX_ProcessingParams. v1

This patch modifies 4 files.  The modifications are independent.
r+ for three of the four files.  r+ for these files:

lib/libpkix/pkix/params/pkix_buildparams.c
lib/libpkix/pkix/params/pkix_trustanchor.c
lib/libpkix/pkix/params/pkix_valparams.c

Let's discuss the change to the remaining file
lib/libpkix/pkix/params/pkix_procparams.c
Comment on attachment 287915 [details] [diff] [review]
leaks in PKIX_ProcessingParams. v1

I have previously given r+ for 3 of the files in this patch.
But now I must give r- for the changes to the file 
libpkix/pkix/params/pkix_procparams.c

There is one problem that occurs in most of the functions modified
in this patch.  It occurs after the label "cleanup", in code that 
looks like this:

>-        if (PKIX_ERROR_RECEIVED){
>-                PKIX_DECREF(date);
>+        if (PKIX_ERROR_RECEIVED) {
>+            PKIX_DECREF(params->date);
>         }

The problem is that, after label cleanup, the variable params can be
NULL, because there are error paths that goto cleanup if params is NULL.
So all the calls to PKIX_DECREF that pass a value in the params struct
must first check to see that params is not NULL, e.g. 

>+        if (PKIX_ERROR_RECEIVED && params) {
>+            PKIX_DECREF(params->date);
>         }
Attachment #287915 - Flags: review?(nelson) → review-
Blocks: 403470
Hi Alexei, I'm checking the history of changes in Tindebox + resolving failures in nightly tests. Seems that your patch from 2007-11-09 17:45 related to this bug causes many failures (2664/3558 Passed on most platforms). Please fix this ASAP. 
Slavo, integration of changes to pkix_pl_pk11certstore.c introduce a regression: arena was deallocated twice in one of the function pkix_pl_Pk11CertStore_CertQuery. Fix is integrated.
Nice catch, Nelson. This patch fixes the problem.
Attachment #287915 - Attachment is obsolete: true
Attachment #288359 - Flags: review?(nelson)
Comment on attachment 288082 [details] [diff] [review]
Localize the scope of PKIX_MUTEX_LOCK/UNLOCK macros

r=nelson
Attachment #288082 - Flags: review?(nelson) → review+
Comment on attachment 288359 [details] [diff] [review]
leaks in PKIX_ProcessingParams. v2

r=nelson
Attachment #288359 - Flags: review?(nelson) → review+
Attachment 288359 [details] [diff] has been committed.
Depends on: 399300
Attached patch Main patch v3 (obsolete) — Splinter Review
Attachment #288699 - Flags: review?(nelson)
patch 288082 has been integrated.
Comment on attachment 288699 [details] [diff] [review]
Main patch v3

r-.  I gave the review comments to Alexei over the phone.
Attachment #288699 - Flags: review?(nelson) → review-
Attached patch Main patch v4Splinter Review
Attachment #288699 - Attachment is obsolete: true
Attachment #288930 - Flags: review?(nelson)
Version: 3.12 → trunk
Comment on attachment 288930 [details] [diff] [review]
Main patch v4

Alexei, Your latest patch includes this:

>+/* This local error check macro */
>+#define ERROR_CHECK(errCode) \
>+    if (pkixErrorResult) { \
>+        pkixTempErrorReceived = PKIX_TRUE; \
>+        pkixErrorClass = pkixErrorResult->errClass; \
>+        if (pkixErrorClass == PKIX_FATAL_ERROR) { \
>+            goto cleanup; \
>+        } \
>+        if (verifyNode) { \
>+            PKIX_INCREF(pkixErrorResult); \
>+            verifyNode->error = pkixErrorResult; \
>+            pkixErrorCode = errCode; \
>+        } \
>+        goto cleanup; \
>+    }

I think that last part should be:

>+        if (verifyNode) { \
>+            PKIX_INCREF(pkixErrorResult); \
>+            verifyNode->error = pkixErrorResult; \
>+        } \
>+        pkixErrorCode = errCode; \
>+        pkixErrorMsg = PKIX_ErrorText[errCode];
>+        goto cleanup; \
>+    }

Do you agree?
I agree. An error code should be set regardless of verifyNode value.
Comment on attachment 288930 [details] [diff] [review]
Main patch v4

r=nelson, provided that you make the change we discussed in comment 29.

I have some concerns with pkix_BuildForwardDepthFirstSearch(), but 
I'm not going to hold up this patch for those things, because I know 
there's more work to be done in this function later.
Attachment #288930 - Flags: review?(nelson) → review+
Attachment 288930 [details] [diff] has been integrated.
Did this get checked-in? If so, can the bug be resolved as fixed?
Attachment #291570 - Flags: review?(nelson)
Comment on attachment 291570 [details] [diff] [review]
One more place where PKIX_Error is leaked

r+ because the new code seems correct.  But please explain to me the path
through pkix_ForwardBuilderState_Create that you're trying to correct.
Was it a leak?  Or a double-free?
Attachment #291570 - Flags: review?(nelson) → review+
It was a leak of PKIX_Error object. pkixTempResult is not checked before exit and its value is overridden with a first macro code.
attachment 291570 [details] [diff] [review] in on the trunk.
PKIX_Error object will be leaked only if pkix_VerifyNode_Create failed and return an error. The execution path: 
->PKIX_BuildChain
    ->pkix_Build_InitiateBuildChain
          ->pkix_VerifyNode_Create
          <-returns an error
       define new FATAL error and leak the error
       that was returned by pkix_VerifyNode_Create.
pkix_VerifyNode_Create will leak PKIX_VerifyNode object pointed by "node" if error occurred after PKIX_PL_Object_Alloc call.
Attachment #291719 - Flags: review?(nelson)
Alexei, your explanation in comment 38 makes no mention of function
pkix_ForwardBuilderState_Create.  Please explain the leaking path of 
execution in that function.
Comment on attachment 291719 [details] [diff] [review]
Fix in pkix_VerifyNode_Create

r=nelson
Attachment #291719 - Flags: review?(nelson) → review+
(In reply to comment #40)
> Alexei, your explanation in comment 38 makes no mention of function
> pkix_ForwardBuilderState_Create.  Please explain the leaking path of 
> execution in that function.
> 

Oh, I see why you have questions about pkix_ForwardBuilderState_Create. The fix to this function make it not to leak created state in case of error, but has nothing to do with a leak of PKIX_Error object in function pkix_Build_InitiateBuildChain. Sorry for confusion.
Fix object leaks that could happen in <object>_Create function due to occurred error conditions.
Attachment #291756 - Flags: review?(nelson)
attachment 291719 [details] [diff] [review] has been committed.
(In reply to comment #43)
> Created an attachment (id=291756) [details]
> Object leaks in <object>_Create functions.
> 
> Fix object leaks that could happen in <object>_Create function due to occurred
> error conditions.
A bug was found during review of pkix_pl_CRL_CreateToList. Opening new bug 407064 since the bug is unrelated to this patch. 
Comment on attachment 291756 [details] [diff] [review]
Object leaks in <object>_Create functions.

r=nelson
Attachment #291756 - Flags: review?(nelson) → review+
This patch fixes the last set of known object leaks.

What remains are memory leaks in pkix_pl_cert.c what will be fixed by consequent patch.
Attachment #293596 - Flags: review?(nelson)
Comment on attachment 293596 [details] [diff] [review]
One more set of objects leaks.

r+=nelson, with one caveat: I can't vouch for the correctness of function 
pkix_BuildForwardDepthFirstSearch.
It's too big, too long, with too many local variables for mere mortals to 
analyze.  Let's put on our list of lower priority cleanup work to decompose
that monster into smaller functions whose correctness is verifiable.
Attachment #293596 - Flags: review?(nelson) → review+
I've opened a bug 402733 to trace leaks in pkix_build.c because of it's couple monstrous functions, pkix_BuildForwardDepthFirstSearch in particular. So the bug should be used to clean up pkix_build.c code.

patch in attachment 293596 [details] [diff] [review] has been committed.
Attachment #299354 - Flags: review?(nelson)
Comment on attachment 299354 [details] [diff] [review]
Leak in pkix_pl_OcspRequest_Create

r=nelson
Attachment #299354 - Flags: review?(nelson) → review+
attachment 299354 [details] [diff] [review] has been integrated.
Comment on attachment 299354 [details] [diff] [review]
Leak in pkix_pl_OcspRequest_Create

Sorry, but I think this patch causes a crash. More info in the next comment (this patch edit field ist just too small)
I was testing around EV and OCSP.
My test case was simple: Visit the paypal web site.

Using Firefox trunk + NSS Beta 1
 => works fine

Using Firefox trunk + NSS trunk 
 => crashes during the cleanup of pkix ocsp request

Using Firefox trunk + NSS trunk + attachment 299354 [details] [diff] [review] backed out
 => works fine

Please let me know if you would like a separate bug report for this, but I propose to simply back this out?

In reply to comment 54:
Kai, do you know about the form text resizer bookmarklet? 
It will let you resize any form text entry box.  Very handy.

In reply to comment 55:
Please file a separate bug.  I believe this patch is correct.
The crash is due another bug, a latent bug that was formerly masked by the 
memory leak.  Correcting the memory leak reveals that other latent bug.

There's a big block comment in the function being patched here.  It warns
of a double free, unless the coder is "careful".  If your crash is in free,
or malloc, then I suspect the problem is a double free of that CERTCertificate 
object.  The writer of that comment apparently did not know how to "addref" 
an NSS CERTCertificate object.  That is done by calling CERT_DupCertificate.  
What's the status here? Is this bug fixed or was the patch backed out. Note that if multiple code-paths think they own the memory and free it, there's a good chance of exploitable crashes.
I can no longer reproduce the crash.

In other words, the latest trunk appears to work fine.
So the patch was landed? Can this bug be marked fixed? We no longer leak?
(In reply to comment #59)
> So the patch was landed? Can this bug be marked fixed? We no longer leak?

Yes, Alexei had checked this patch in, see comment 53.

I don't know what fixed the crash, maybe some other check in fixed it. As Nelson had said in comment 56 the cause of the crash was probably elsewhere.


I don't know if this bug is ready to be marked fixed.
Jonas, A great many leaks have been fixed, and the fixes for those leaks are
now present in the drop of NSS being used presently in trunk browser builds.
We are not yet prepared to say that ALL leaks are fixed.  
But it may be that all (or the vast majority) of the leaks that were 
experienced by the browser are fixed.  Only a browser person can determine 
that.  If you conclude that the leaks in libPKIX are no longer a major issue
for the browser, then feel free to remove the blocking flag from this bug.
The NSS team did not set it, and the NSS team will not clear it.
Attached patch Object leak test(patch v1) (obsolete) — Splinter Review
Code and script to check object/mem leaks in libpkix.
Some object leak fixes.
Attachment #305807 - Flags: review?(nelson)
Comment on attachment 305807 [details] [diff] [review]
Object leak test(patch v1)

Slavo, please review the part of the patch which modifies test scripts. In particular, I'm looking for review of the memory leak work I've done in libpkix.sh
Attachment #305807 - Flags: review?(slavomir.katuscak)
Alexei, I have some comments to your patch:
1. I don't understand why half of all.sh is removed. Is it only change you're using on your test machine ? This affects also other parts of testing and cleanup is missing there (should close HTML page).
2. libpkix_leak_test() - when extraOpt is set once, it's never cleaned up and is applied for all next while iterations, I see that you read data from vfychain_test.lst, but I don't see this file in CVS or in patch, so I can't check the content
3. libpkix_setup_db() - maybe using OUTPUT=`certutil ... 2>&1` would be faster then temporary file (in case you don't reach limit of the content size of shell variable, see bug 400103 comment 2), but this is not a critical
4. On some places you use checking if [ ${..} ], would be better to use -n, also is see some if [ ! -z ${..} ] in libpkix.sh.
Attachment #305807 - Flags: review?(slavomir.katuscak) → review-
Comment on attachment 305807 [details] [diff] [review]
Object leak test(patch v1)

This patch apparently contains some unintended parts, so rather than review it,
I'll let you resubmit it with just the intended parts.
Please address Slavo's issues, except that I prefer the use of a temporary file
to relying on shell buffers being very large, so that point is OK as is.
Attachment #305807 - Flags: review?(nelson)
Comment on attachment 305807 [details] [diff] [review]
Object leak test(patch v1)

1. In source file lib/certhigh/certvfypkix.c Please move the declaration 
of this function:

> extern void
> cert_AddToVerifyLog(CERTVerifyLog *log, CERTCertificate *cert,
>                     unsigned long errorCode, unsigned int depth,
>                     void *arg);

out of this file and into the appropriate header file.  We really need to 
stamp out all extern function declarations in .c files!

2. In function cert_VerifyCertChainPkix, you added a label "start" near
the beginning, and then at the very end, before the final return rv,
you added a conditional goto that goes back to start.  Please replace
the label start with 
   do {
and replace the conditional goto with
   } while (errorGenerated);
Don't worry about the indentation.

BTW, PLEASE do not code any tests that read "== PR_TRUE".
Let's not write code that appears to have been written by newbies.
If they're booleans, just test them without comparison, e.g. rather than 
   if (foo == PR_TRUE)
code
   if (foo)

3.  Please, no use of stdin, stdout, stderr in libNSS.  So delete this:
>+    printf("Loop %d\n", memLeakLoopCount++);

4. This error is misspelled, and its meaning is unclear.
>+PKIX_ERRORENTRY(GENERATEDERRORRECEIVED, Recieve generated error,0),
                                             ^^
I have no idea what that means.  To what does "receive" refer?
Does that mean we received an error that was generated by something?  
Or that the act of receiving something generated an error?  

5. Here's another error code whose meaning is unclear.
PKIX_ERRORENTRY(NSSCERTIFICATEUSAGETOPKIXKUANDEKUFAILED,Nss certificate usage to pkix ku and eku failed,0),

6. Don't initialize external or static variables with zero values.
It's unnecessary because the c language guarantees all such variables are initialized to zero, and it increases the code file size.
Just leave them uninitialized.  Here are some examples, but please fix 
this throughout libPKIX code, starting with this patch.

>+PKIX_Boolean pkixInitInProgress = PKIX_FALSE;
>+PKIX_Boolean pkixShutdownInProgress = PKIX_FALSE;
>+PKIX_UInt32 stackPosition = 0;

7. In pkix_Throw, you added this line:
                      PKIX_INCREF(cause);
I don't understand why.  cause is not subsequently decref'ed.

9. MAJOR: New function pkix_ErrorGen_ProgressHash trashes the stack. 
The problem is this definition:

>+          PKIX_UInt32 *rvc = (PKIX_UInt32 *)&rv;

That needs to be an unsigned char pointer, not an int32 pointer.
Does this discovery invalidate any of the test results?

10.  regarding these lines:

> /* #define TEST_START_FN "pkix_BuildForwardDepthFirstSearch" */
> /* #define TEST_START_FN_STACK_POS 4 */
> #define TEST_START_FN "PKIX_BuildChain"
> #define TEST_START_FN_STACK_POS 2

a) please add comments explaining what these symbols are and do.
b) please if some sort of #if rather than commenting out those two
lines, or else just remove those two commented lines.

11. MAJOR: Your new definition of macro PKIX_STD_VARS really needs to be 
moved into a function.  We don't want all this code in every libPKIX
function.  But this issue can be fixed after this patch (or its successor)
is checked in.

12. MAJOR: In function pkix_pl_Cert_CreateToList, your patch appears to 
break this function.  If CERT_DecodeDERCertificate returns a cert, you 
toss it, but if it returns NULL, yo try to create a PKIX cert from NULL.

13. Your patch removes a call to PKIX_OBJECT_UNLOCK(lockedObject);
from PKIX_PL_Cert_GetSubjectPublicKey.  That doesn't seem right.
Let's discuss it.

14.  The reason for the changes in file 
lib/libpkix/pkix_pl_nss/system/pkix_pl_hashtable.c  are not apparent.
Please explain them to me, and then add a large comment to that file
explaining it.

15. Remove this comment:
  /* Turn this code on again after bug 397832 get fixed. */
because your patch "turns this code on again".

16. In function pkix_pl_Object_Destroy, you removed this line:
>-        objectHeader->magicHeader = 0;
why?  wasn't that there to detect references to destroyed objects?
Attachment #305807 - Flags: superreview-
Thank for review, Slavo

> 1. I don't understand why half of all.sh is removed. Is it only change you're
I've attached a wrong file. Disregard.  

> 2. libpkix_leak_test() - when extraOpt is set once, it's never cleaned up and
> is applied for all next while iterations, I see that you read data from
> vfychain_test.lst, but I don't see this file in CVS or in patch, so I can't
> check the content
Good catch.

Attached patch Object leak test(patch v2) (obsolete) — Splinter Review
Changes according review comments.
Attachment #305807 - Attachment is obsolete: true
Attachment #310352 - Flags: superreview?(nelson)
Attachment #310352 - Flags: review?
Attached patch Object leak test(patch v3) (obsolete) — Splinter Review
Changes according review comments.
Attachment #310352 - Attachment is obsolete: true
Attachment #310353 - Flags: superreview?(nelson)
Attachment #310353 - Flags: review?
Attachment #310352 - Flags: superreview?(nelson)
Attachment #310352 - Flags: review?
Attachment #310353 - Flags: review? → review?(slavomir.katuscak)
Comment on attachment 310353 [details] [diff] [review]
Object leak test(patch v3)

Alexei, please make a version of this patch that bugzilla can compare the patch v1.  The new patch should patch the same files as the old one, and the same file versions as the old one.  Any patches to any new files (not previously patched in the old patch) can be made into a separate patch.  
Alterantively, you can carry the old patch forward to current file versions, so that the two patches are comparable.
(In reply to comment #66)
> (From update of attachment 305807 [details] [diff] [review])
> 7. In pkix_Throw, you added this line:
>                       PKIX_INCREF(cause);
> I don't understand why.  cause is not subsequently decref'ed.
Code returns the cause as a newly created error. The count on reference
on the cause should be incremented. Otherwise we will get a dangled pointer
after exiting the pkix_Throw call.

> 9. MAJOR: New function pkix_ErrorGen_ProgressHash trashes the stack. 
Function is not used: removed.

> 12. MAJOR: In function pkix_pl_Cert_CreateToList, your patch appears to 
> break this function.  If CERT_DecodeDERCertificate returns a cert, you 
> toss it, but if it returns NULL, yo try to create a PKIX cert from NULL.
Nice catch. Thx!

> 13. Your patch removes a call to PKIX_OBJECT_UNLOCK(lockedObject);
> from PKIX_PL_Cert_GetSubjectPublicKey.  That doesn't seem right.
> Let's discuss it.
I've removed macro code and replace it with a function call. The
function calls PKIX_OBJECT_UNLOCK.

> 14.  The reason for the changes in file 
> lib/libpkix/pkix_pl_nss/system/pkix_pl_hashtable.c  are not apparent.
> Please explain them to me, and then add a large comment to that file
> explaining it.
Use of hashtable makes object leak testing impossible as it saves pointer
to cert,crl and cert chains. Disable ht functionality to the testing.
Attached patch Object leak test(patch v4) (obsolete) — Splinter Review
Patch that was made against 02/26/08 NSS code base.
Attachment #310353 - Attachment is obsolete: true
Attachment #310842 - Flags: review?(nelson)
Attachment #310353 - Flags: superreview?(nelson)
Attachment #310353 - Flags: review?(slavomir.katuscak)
Comment on attachment 310842 [details] [diff] [review]
Object leak test(patch v4)

This patch is only missing modification to mozilla/security/coreconf/config.mk file as following:
Index: coreconf/config.mk
===================================================================
RCS file: /cvsroot/mozilla/security/coreconf/config.mk,v
retrieving revision 1.22
diff -U 3 -p -r1.22 config.mk
--- coreconf/config.mk  29 Oct 2007 22:10:53 -0000      1.22
+++ coreconf/config.mk  20 Mar 2008 21:55:00 -0000
@@ -190,6 +190,10 @@ ifdef BUILD_LIBPKIX_TESTS
 DEFINES += -DBUILD_LIBPKIX_TESTS
 endif
 
+ifdef PKIX_OBJECT_LEAK_TEST
+DEFINES += -DPKIX_OBJECT_LEAK_TEST
+endif
+
Comment on attachment 310842 [details] [diff] [review]
Object leak test(patch v4)

Alexei, when I view the interdiff of this patch against the last patch I 
reviewed, which was patch v1, as seen in this URL
https://bugzilla.mozilla.org/attachment.cgi?oldid=305807&action=interdiff&newid=310842&headers=1
I see lots of problems.   Some lines are shown to be deleted that this patch 
actually adds.  Lines from .c files are shown as appearing in shell scripts.  
In short, I simply cannot rely on the interdiff output to tell me what really 
changed in this patch.  Please try again.
Attachment #310842 - Flags: review?(nelson) → review-
Attached patch Object leak test(Patch v5) (obsolete) — Splinter Review
Diff between v4 and v1 is messed up at the end of the representation: changes that were added into the v4 patch a shown as they were added to patch v1. But it easy to recognize the problem, since there is no dash line in between diff of the patches.
One more try...
Attachment #310842 - Attachment is obsolete: true
Attachment #311000 - Flags: review?(nelson)
Again, there are small number of changes(only two blocks at the and) that are incorrectly represented when you do diff of v5 and v1.

void cert_AddToVerifyLog

and

PKIX_DECREF(oid); 

are added to the patch version 5 and are not present at v1. But you can notice, that there is no dash line to the changes right side and so it is easy to spot.

The rest of the changes are shown correctly.
Also, the patch information from several different files is mixed together.
When I see that, I don't trust these interdiff results AT ALL.  Sorry.
This patch needs to be split into two parts, one part that patches ONLY 
the same files as patch v1, and a separate patch that patches any additional
files.  
Would it perhaps be easier for you to copy the relevant subset of your tree, apply v1 to one and vN to the other, and then do a diff locally?  Making Alexei work around bugzilla's silly interdiff limitations seems inefficient.
Attached patch Object leak patch(Patch v6) (obsolete) — Splinter Review
Attachment #311000 - Attachment is obsolete: true
Attachment #311069 - Flags: review?(nelson)
Attachment #311000 - Flags: review?(nelson)
Attached patch Object leak patch(Patch v7) (obsolete) — Splinter Review
Attachment #311069 - Attachment is obsolete: true
Attachment #311077 - Flags: review?(nelson)
Attachment #311069 - Flags: review?(nelson)
Attachment #311397 - Flags: review?(nelson)
Attached patch Object leak patch v8 (obsolete) — Splinter Review
To see why interdiff continued to complain that the patches v1 and v7 are
for different sets of files, I downloaded both patches and ran this command:

grep Index /tmp/397832v[17].txt | cut -f 2 -d ' ' | sort | uniq -c | grep 1

The output was 

   1 nss/lib/libpkix/pkix/top/pkix_build.c

So, I found the diff for that file in patch v7 and removed it, producing
patch v8.  

I then verified that the file version numbers in patches v1 and v8 were all
the same with this command:

grep '^---' /tmp/397832v[18].txt | cut -f 2-4 -d ':' | sort | uniq -c | \
grep '^   1'

Hopefully v1 and v8 will compare without any complaints about different 
versions of sets of files.
Attachment #311077 - Attachment is obsolete: true
Attachment #311077 - Flags: review?(nelson)
This patch contains the diffs for the files that were not included in v1
Attachment #311397 - Attachment is obsolete: true
Attachment #311397 - Flags: review?(nelson)
Comment on attachment 311421 [details] [diff] [review]
Patch v8 for additional files, not modified in v1

r+ for the changes in this small patch.
Attachment #311421 - Flags: review+
Comment on attachment 311420 [details] [diff] [review]
Object leak patch v8

thank you, Nelson. Diff between v1 and v8 works correctly now. Please review it.
Attachment #311420 - Flags: review?(nelson)
Alexei, You're supposedly on vacation today!  If not, call me.  :)

I'm not done with this review, but here are some preliminary review comments.

1. This patch adds more "extern" declarations of variables and/or functions
in .c files, rather than putting those declarations in header files where 
they belong.  See certvfypkix.c  pkix_lifecycle.c
I'm not asking (in this bug) that all existing extern declarations in .c 
files be removed. That's the subject of bug 424857. But I'm asking that
this bug/patch not add any new extern declarations to .c files.

2. Is is true that all this leak testing is single-threaded testing?
I ask because there are several aspects of this new leak testing code that
are not safe or correct for multi-threaded operation.  MAYBE that's OK if 
the libPKIX leak testing is all single threaded, but otherwise it's not.

examples:
a) The global variables used to track some internal state of a single cert
verification (on a single thread) in pkix_tools.c.  If multiple threads 
attempt cert verification simultaneously, they will step on each other.  

b) The new functions pkix_pl_lifecycle_ObjectTableUpdate and pkix_pl_lifecycle_ObjectLeakCheck make a copy of the global per-class object counts at the beginning of a verification, and then check that the copied numbers against the global values again at the end of the verification to 
see if they have changed, indicating a leak.  Here again, if multiple threads
are running, these results will be incorrect.  

If we decided that this code will be OK anyway, because the leak-checking 
builds are supposed to be used ONLY in single-threaded testing, then I'd 
say we need to put some code into leak checking builds that detects that 
multiple threads are using libPKIX and aborts if so.  
In reply to comment #82, it occurs to me, in retrospect that 
    uniq -u
is simpler and potentially more accurate than either 
>   uniq -c | grep 1
or 
>   uniq -c | grep '^   1'
(In reply to comment #84)
> (From update of attachment 311421 [details] [diff] [review])
> r+ for the changes in this small patch.

The patch is integrated.
Address review comments:
  move exports into pkix_tool.h
  disallow object leak test code to be built for optimized libraries.
  prevent multi-threaded object leak test runs.
Attachment #311420 - Attachment is obsolete: true
Attachment #311680 - Flags: review?(nelson)
Attachment #311420 - Flags: review?(nelson)
Comment on attachment 311680 [details] [diff] [review]
Object leak test(patch v9)

r=nelson
Attachment #311680 - Flags: review?(nelson) → review+
object leak test patch is integrated.
Attachment #314614 - Flags: review?(nelson)
Comment on attachment 314614 [details] [diff] [review]
Two more leaks in libpkix

r+
Attachment #314614 - Flags: review?(nelson) → review+
Comment on attachment 315629 [details] [diff] [review]
A few modifications to obj leak test code needed for tinderbox

stackErrorCodes array is added to printout error descriptions that caused object leak.
Added -f "${R_PWFILE}" argument to certutil to provide db password.
Comment on attachment 315629 [details] [diff] [review]
A few modifications to obj leak test code needed for tinderbox

This patch changes 3 files.  I'm giving r+ for the changes to the
files tests/libpkix/libpkix.sh and lib/libpkix/pkix/util/pkix_tools.h

I want to go over the changes to file lib/certhigh/certvfypkix.c 
with you in person, Alexei, before giving a final review to them.
Comment on attachment 315629 [details] [diff] [review]
A few modifications to obj leak test code needed for tinderbox

r=nelson
Attachment #315629 - Flags: review?(nelson) → review+
Patch 315629 is integrated.
Depends on: 429230
Attachment #311421 - Attachment is obsolete: true
All patches are integrated. Object leak tinderbox runs without indicating any problems. Closing the bug..
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
New leak is found in libpkix. Some modification are also need to object leak test to run tests through CERT_PKIXVerifyCert.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #316504 - Flags: review?(nelson)
Allows to printout a stack at which the error was generated that cause an object leak.
Adding paypal EE cert to be a part of the leak test.
Attachment #316505 - Flags: review?(nelson)
Attachment #316504 - Flags: review?(nelson) → review+
Comment on attachment 316505 [details] [diff] [review]
Additional modification to object leak test

Alexei, let's go over this patch together at your soonest convenience.
New bug for new patch is best. REOPEN and more patching should mean the patch before the bug was closed was backed out.

/be
Comment on attachment 316505 [details] [diff] [review]
Additional modification to object leak test

Will use bug 430135 for object leak improvements.
Attachment #316505 - Attachment is obsolete: true
Attachment #316505 - Flags: review?(nelson)
316505 is integrated. Closing
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Alexei, when you made this check in to this bug:

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=alexei.volkov.bugs%25sun.com&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2008-04-08&maxdate=2008-04-10&cvsroot=%2Fcvsroot

you have done unrelated changes, reverted some of my tracing code.

In particular, you changed PR_LOGGING back to DEBUG_kaie and removed an PKIX_ENTER call.


@Christophe: This is the cause of the build error you saw.
Kai, I agree that the checkin did not match the patch that was reviewed,
which was attachment 314614 [details] [diff] [review].

Alexei is on vacation now until sometime in June, so if this needs to be
fixed before then, someone else must write a patch to undo the unreviewed
changes. 
Seems like a job for the reviewer, since it would be silly to escalate this to Alexei's CVS voucher.
Attached patch (wrong patch) (obsolete) — Splinter Review
I created a patch to undo the changes that were not part of the originally
reviewed patch, by backing out the committed changes, and then applying the 
original patch.  That patch is attached for reference, but I am giving it an
r- because I do not want to see it committed. In studying this patch, I see 
two differences:

1) some #ifdefs were changed from 
  #ifdef PR_LOGGING 
to 
  #ifdef DEBUG_kaie 

2) A call to PKIX_ENTER was removed from function  pkix_trace_dump_cert.

I understand the reason for the first of those changes.  It solved a real
problem.  I recall discussing it with several people at the time. 
It is possible that I was asked for and gave a verbal approval to make that change, as a temporary solution to the problem.  I do not recall clearly now, 
but had I been asked, I would have given that approval, and today I am willing
to retroactively approve that already-committed change for that purpose.  

The ifdef change was done to stop the very verbose logging output that was 
appearing in all our test runs and tinderbox runs, and was swelling the 
output log files considerably.  Other NSS developers found that they had 
difficulty finding the output they wanted in the midst of all that extra 
debug output.  Also, they didn't understand all that output. 
Changing that ifdef caused that debug output to cease in Tinderbox run logs, 
and in the test runs performed by the other NSS developers.  

The real issue there, IMO, is that that verbose debug output is enabled by
the presence of an environment variable that has another primary purpose, 
and is used in all our testing.  That environment variable was overloaded.
In libpkix/pkix_pl_nss/system/pkix_pl_lifecycle.c function PKIX_PL_Initialize 
we see:
171         if (PR_GetEnv("NSS_STRICT_SHUTDOWN")) {
172             pkixLog = PR_NewLogModule("pkix");
173         }

I would be very willing to undo the changes to those ifdefs at the same
time as the above environment variable is changed to some other, such as 
PKIX_VERBOSE_TRACE or something similar.  If someone would prepare a patch
that combines the one attached to this comment with a change to that 
environment variable, I would give it serious consideration.

The second change seen in the attached patch, the removal of the call to 
PKIX_ENTER in function pkix_trace_dump_cert, may have a similar technical 
justification.  I suspect that it was causing other PKIX logging and/or pkix 
error stack handling to receive lots of extra noise, even in the case where 
NSS_STRICT_SHUTDOWN was not defined.  Whatever the reason, I do not think
it presents a problem that needs urgent attention, and I would prefer to 
have Alexei tell us about it when he returns.  However, I am open to being
persuaded that this does need urgent attention, if a good reason can be given.
Attachment #322177 - Flags: review-
Sorry, that last attachment was just a copy of the previously reviewed patch.
This is the patch file that I made.
Attachment #322177 - Attachment is obsolete: true
Attachment #322178 - Flags: review-
Attachment #322177 - Attachment description: patch to undo unapproved changes (not for immediate checkin) → (wrong patch)
Attachment #322178 - Attachment is patch: true
(In reply to comment #110)
> The ifdef change was done to stop the very verbose logging output that was 
> appearing in all our test runs and tinderbox runs, and was swelling the 
> output log files considerably.

I do not remember having seen this.
Was there a bug filed?
It would have been good to cc me (I hope I didn't miss it).


> Other NSS developers found that they had 
> difficulty finding the output they wanted in the midst of all that extra 
> debug output.  Also, they didn't understand all that output. 
> Changing that ifdef caused that debug output to cease in Tinderbox run logs, 
> and in the test runs performed by the other NSS developers.  
> 
> The real issue there, IMO, is that that verbose debug output is enabled by
> the presence of an environment variable that has another primary purpose, 
> and is used in all our testing.  That environment variable was overloaded.
> In libpkix/pkix_pl_nss/system/pkix_pl_lifecycle.c function PKIX_PL_Initialize 
> we see:
> 171         if (PR_GetEnv("NSS_STRICT_SHUTDOWN")) {
> 172             pkixLog = PR_NewLogModule("pkix");
> 173         }
> 
> I would be very willing to undo the changes to those ifdefs at the same
> time as the above environment variable is changed to some other, such as 
> PKIX_VERBOSE_TRACE or something similar.  If someone would prepare a patch
> that combines the one attached to this comment with a change to that 
> environment variable, I would give it serious consideration.

I'm ok with the idea to use a new env var.

So, are we ok to keep having this wrapped in PR_LOGGING, assuming we switch to a separate env var?


> The second change seen in the attached patch, the removal of the call to 
> PKIX_ENTER in function pkix_trace_dump_cert, may have a similar technical 
> justification.  I suspect that it was causing other PKIX logging and/or pkix 
> error stack handling to receive lots of extra noise, even in the case where 
> NSS_STRICT_SHUTDOWN was not defined.

I don't know what the reason was, but without this PKIX_ENTER, the code does not compile at all.


> Whatever the reason, I do not think
> it presents a problem that needs urgent attention, and I would prefer to 
> have Alexei tell us about it when he returns.

Not urgent unless you want to compile it (as Christophe recently wanted to).

Why would it be bad to add an PKIX_ENTER statement, if this is wrapped inside our new debugging variable anyway? I suspect you have missed that it is wrapped.
Nelson, this is your patch, I'm giving r=kaie to your changes.
This adds your proposed environment variable.
Attachment #322633 - Flags: review?(nelson)
Comment on attachment 322633 [details] [diff] [review]
Patch v42: Revert and use env var PKIX_VERBOSE_TRACE

Kai, I may be mistaken, but I think that any function that calls PKIX_ENTER 
must also return by calling PKIX_RETURN, and must be a function that returns 
a PKIX_ERROR * (a ptr to a PKIX_ERROR).  I think that, if this rule is not 
followed, the error handling code may have problems.   

I suspect that what happened was this:
First, Alexei removed the PKIX_ENTER, since this function doesn't generally 
follow the PKIX library API conventions, and causes the error handling 
problems.  Then he saw that it didn't compile, so he changed the ifdef to 
remove it, rather than try to redesign it.  

I have no objection, in principle, to re-enabling this code in debug builds
provided that it is conditioned on an environment variable that is not used
in our QA test runs (including Tinderbox).  I'm giving this review request
to Alexei, who can tell us more about the requirements for PKIX_ENTER.
Attachment #322633 - Flags: review?(nelson) → review?(alexei.volkov.bugs)
Comment on attachment 322633 [details] [diff] [review]
Patch v42: Revert and use env var PKIX_VERBOSE_TRACE

I was right that, without the PKIX_RETURN macro, function pkix_trace_dump_cert will leak any PKIX_ERRORs that are returned by 
the other libPKIX functions it calls.  Also, with the PKIX_ENTER macro 
in place, every call to this function will log the fact that this 
function was called.  So, I have filed bug 436949 to track the work to 
fix function pkix_trace_dump_cert and restore it (change the ifdef).  

This bug (Bug 397832) was fixed in 3.12.0.  Bug 436949 will be fixed
in a subsequent release.  Further discussion of function 
pkix_trace_dump_cert should happen in that bug.
Attachment #322633 - Flags: review-
Attachment #322633 - Flags: review?(alexei.volkov.bugs)
Blocks: 550929
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: