Last Comment Bug 397832 - libpkix leaks memory if a macro calls a function that returns an error
: libpkix leaks memory if a macro calls a function that returns an error
Status: RESOLVED FIXED
PKIX
: mlk
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: trunk
: All All
: P2 normal (vote)
: 3.12
Assigned To: Alexei Volkov
:
Mentors:
Depends on: 399300 402727 429230
Blocks: 403470 550929
  Show dependency treegraph
 
Reported: 2007-09-27 16:32 PDT by Alexei Volkov
Modified: 2010-03-08 09:50 PST (History)
15 users (show)
dsicore: blocking1.9+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Initial patch (139.21 KB, patch)
2007-11-02 18:25 PDT, Alexei Volkov
no flags Details | Diff | Review
Patch v2. (91.84 KB, patch)
2007-11-03 11:07 PDT, Alexei Volkov
nelson: review-
Details | Diff | Review
leaks in PKIX_ProcessingParams. v1 (15.06 KB, patch)
2007-11-08 15:47 PST, Alexei Volkov
nelson: review-
Details | Diff | Review
pkix_pl_hashtable.c chages (6.77 KB, patch)
2007-11-09 16:05 PST, Alexei Volkov
no flags Details | Diff | Review
Localize the scope of PKIX_MUTEX_LOCK/UNLOCK macros (14.50 KB, patch)
2007-11-09 17:51 PST, Alexei Volkov
nelson: review+
Details | Diff | Review
leaks in PKIX_ProcessingParams. v2 (12.95 KB, patch)
2007-11-12 12:27 PST, Alexei Volkov
nelson: review+
Details | Diff | Review
Main patch v3 (78.94 KB, patch)
2007-11-14 10:26 PST, Alexei Volkov
nelson: review-
Details | Diff | Review
Main patch v4 (80.23 KB, patch)
2007-11-15 17:06 PST, Alexei Volkov
nelson: review+
Details | Diff | Review
One more place where PKIX_Error is leaked (2.71 KB, patch)
2007-12-04 16:56 PST, Alexei Volkov
nelson: review+
Details | Diff | Review
Fix in pkix_VerifyNode_Create (1.33 KB, patch)
2007-12-05 12:05 PST, Alexei Volkov
nelson: review+
Details | Diff | Review
Object leaks in <object>_Create functions. (31.15 KB, patch)
2007-12-05 14:21 PST, Alexei Volkov
nelson: review+
Details | Diff | Review
One more set of objects leaks. (39.36 KB, patch)
2007-12-17 15:58 PST, Alexei Volkov
nelson: review+
Details | Diff | Review
Leak in pkix_pl_OcspRequest_Create (1.05 KB, patch)
2008-01-25 17:40 PST, Alexei Volkov
nelson: review+
Details | Diff | Review
Object leak test(patch v1) (69.60 KB, patch)
2008-02-26 11:02 PST, Alexei Volkov
slavomir.katuscak+mozilla: review-
nelson: superreview-
Details | Diff | Review
Object leak test(patch v2) (46.01 KB, patch)
2008-03-18 14:59 PDT, Alexei Volkov
no flags Details | Diff | Review
Object leak test(patch v3) (45.06 KB, patch)
2008-03-18 15:00 PDT, Alexei Volkov
no flags Details | Diff | Review
Object leak test(patch v4) (46.48 KB, patch)
2008-03-20 14:50 PDT, Alexei Volkov
nelson: review-
Details | Diff | Review
Object leak test(Patch v5) (48.32 KB, patch)
2008-03-21 10:06 PDT, Alexei Volkov
no flags Details | Diff | Review
Object leak patch(Patch v6) (47.56 KB, patch)
2008-03-21 16:09 PDT, Alexei Volkov
no flags Details | Diff | Review
Object leak patch(Patch v7) (47.32 KB, patch)
2008-03-21 16:23 PDT, Alexei Volkov
no flags Details | Diff | Review
Object leak test(missing new files) (1.36 KB, patch)
2008-03-24 10:05 PDT, Alexei Volkov
no flags Details | Diff | Review
Object leak patch v8 (48.06 KB, patch)
2008-03-24 12:36 PDT, Nelson Bolyard (seldom reads bugmail)
no flags Details | Diff | Review
Patch v8 for additional files, not modified in v1 (2.01 KB, patch)
2008-03-24 12:39 PDT, Nelson Bolyard (seldom reads bugmail)
nelson: review+
Details | Diff | Review
Object leak test(patch v9) (47.04 KB, patch)
2008-03-25 15:42 PDT, Alexei Volkov
nelson: review+
Details | Diff | Review
Two more leaks in libpkix (2.02 KB, patch)
2008-04-09 09:43 PDT, Alexei Volkov
nelson: review+
Details | Diff | Review
A few modifications to obj leak test code needed for tinderbox (4.96 KB, patch)
2008-04-14 14:23 PDT, Alexei Volkov
nelson: review+
Details | Diff | Review
Leak in pkix_PolicyChecker_MakeMutableCopy (758 bytes, patch)
2008-04-18 15:53 PDT, Alexei Volkov
nelson: review+
Details | Diff | Review
Additional modification to object leak test (9.33 KB, patch)
2008-04-18 15:57 PDT, Alexei Volkov
no flags Details | Diff | Review
(wrong patch) (2.08 KB, patch)
2008-05-22 15:01 PDT, Nelson Bolyard (seldom reads bugmail)
nelson: review-
Details | Diff | Review
The REAL patch to undo unapproved changes (not for immediate checkin) (2.10 KB, patch)
2008-05-22 15:04 PDT, Nelson Bolyard (seldom reads bugmail)
nelson: review-
Details | Diff | Review
Patch v42: Revert and use env var PKIX_VERBOSE_TRACE (3.90 KB, patch)
2008-05-27 06:15 PDT, Kai Engert (:kaie)
nelson: review-
Details | Diff | Review

Description Alexei Volkov 2007-09-27 16:32:29 PDT
PKIX_CHECK_ONLY_FATAL macro is defined in a way so that it immediately abort execution and returns received error if executed by this macro function returned a fatal error.

The macro should be fixed to at least have "goto cleanup" instruction.
Comment 1 Nelson Bolyard (seldom reads bugmail) 2007-09-27 17:26:29 PDT
The problem is bigger than just the PKIX_CHECK_ONLY_FATAL macro, and 
going to cleanup will not fix the leaks (by itself).  

Consider, for example, the case in function pkix_Build_InitiateBuildChain(
(which code may have been moved to function pkix_Build_CheckInCache by the
time you read this).  If function pkix_Build_ValidateEntireChain returns 
a FATAL error, that code will potentially leak 4 objects because it will 
bypass the following 4 calls:

        PKIX_DECREF(state->reversedCertChain);
        PKIX_DECREF(state->checkedCritExtOIDs);
        PKIX_DECREF(state->checkerChain);
        PKIX_DECREF(state->revCheckers);

Changing the macro to go to cleanup will not fix those leaks.

Perhaps we should make sure that no libPKIX function ever outputs an
object reference to its caller, and THEN gives a "fatal" error code.
Comment 2 Alexei Volkov 2007-10-01 15:41:39 PDT
The bug should be renamed to become more general. The reason - most of the macros defined by libpkix check the returned value(which is pointer to PKIX_Error in most of the cases) from commands that they called. The macro immediately call "return" from a caller function if the returned value is not equal to NULL. 

One of the example if PKIX_DECREF: 

 #define PKIX_DECREF(obj) \
    do { \
	if (obj){ \
	    pkixTempResult = PKIX_PL_Object_DecRef \
			((PKIX_PL_Object *)(obj), plContext); \
	    if (pkixTempResult) \
		return pkixTempResult; \
	    obj = NULL; \
	} \
    } while (0)

It check for pkixTempResult and return if it is NULL. If we consider Nelson's example in the comment 1, we will leak 3 out of four objects if first PKIX_DECREF failed. 
Comment 3 Brendan Eich [:brendan] 2007-10-11 16:44:11 PDT
return in macros except in very local (function-local macro) cases is a botch. What would happen if you first purged all return-in-macro uses?

/be
Comment 4 Nelson Bolyard (seldom reads bugmail) 2007-10-11 17:16:44 PDT
Brendan,
There was apparently a plan in which Return in macros was not a botch, but 
other aspects of the implementation were botched.  The design was that all 
the allocations done in the libPKIX code for a single cert path validation 
would come from an arenapool that would be allocated at the beginning of 
the path validation operation, and would be discarded at the end of the 
operation.  So the plan was that no memory would actually be leaked.
There is a lot of code in pre-PKIX NSS that works that way, quite 
successfully (without leaks).

But, somewhere between the plan and the coding of libPKIX things went awry.  
Instead of an arenapool for each operation, there is now one global arenapool, 
that is never freed.  

The present plan to resolve this is to fix the code to work as it was 
designed to work, relying on short-lived arenapools to avoid leaks.  
We know that long lived results will need to be copied out of the short-lived
arenapool, or else not allocated from it in the first place.  
Someone is now investigating to see if there are other problems with that.
In other words, if we revert to the original plan, what else will go wrong?
We'll know soon enough.
Comment 5 Alexei Volkov 2007-10-19 16:13:45 PDT
Propose a plan to overcome error condition memory leaks for discussion:

First of all, I assume that pkix library does not allocate memory from any arena. Arena approach for some libpkix objects can be introduced later without any harm to the project, since the libpkixcode has already a mechanism that will avoid deallocation of a memory that was allocated on arena.

This is going to be a big cleanup for pkix_tools.h and pkix_pl_object.c code, but I think that it will not take too much time to implement this approach.   

1. _FATAL macros shold no longer return from a caller function. The fatal error will be treated as any other errors, but we will make sure that we assert in debug build when such error happens.

2. Other macros from pkix_tools.h will be changed. return statement will be removed from macros. Macro generated "local"  errors will be treated the same way as a general errors in optimized build, and will be asserted in debug builds. To avoid leaking PKIX_Error objects generated by sequentially executed macros, errors will be added to the list of errors that will be available in function context. The first received error will be reported to the upper level functions.

3.  Review all functions that calls _FATAL macros (~40 macro invocations) to check for possible object leakage. These are the only macros that intentionally interrupt a function code execution without any cleanup.

Comments?

I'd' estimate this work to be done within 5 working days.
Comment 6 Nelson Bolyard (seldom reads bugmail) 2007-10-19 18:14:01 PDT
Alexei, you appear to be saying that you believe that only the "fatal" errors
create any leaks, and that all other non-fatal errors do not leak.  
Are you indeed saying that?  
Comment 7 Alexei Volkov 2007-10-19 20:17:39 PDT
That is correct. If any other errors occurred, execution will jump to "cleanup" where a function suppose to have proper code object decref/memory deallocation. In fact it is mandatory(you'll get compiler error otherwise) to have "cleanup" label if a function suppose to generate an error condition.
Comment 8 Alexei Volkov 2007-11-02 18:25:57 PDT
Created attachment 287188 [details] [diff] [review]
Initial patch

Fixes issues related to memory leaks due to premature return. Fix macros, error handling functions. Biggest impact to pkix_build.c(main chain building functionality) and pkix_tools.c (macros definition).
Comment 9 Nelson Bolyard (seldom reads bugmail) 2007-11-02 20:27:46 PDT
Comment on attachment 287188 [details] [diff] [review]
Initial patch

OK, here is some preliminary review feedback.

This megapatch is has parts that are separately reviewable.

For many files, the *only* change was to add cleanup: labels
to many functions, and add or remove blank lines.  
r+ for the patches to those files.  Please go ahead and 
commit the changes to those files now (ONLY the files where
the only changes are adding cleanup: and playing with 
blank lines).  Then submit a new patch which is the subset
of this megapatch for the files that you didn't commit.  
I'll continue my review of that new patch, which hopefully
will be a lot smaller.
Comment 10 Alexei Volkov 2007-11-03 11:07:51 PDT
Created attachment 287227 [details] [diff] [review]
Patch v2.

Main patch. Changes to files that have only addition of "cleanup" labels are already integrated.
Comment 11 Alexei Volkov 2007-11-05 12:07:33 PST
There are three main patterns of the code, that leak objects during error conditions:

1. _Create functions create an object, but do not destruct it in case of error in subsequent code. Example is the code at PKIX_CertChainChecker_Create function. If either of PKIX_INCREF fails, cert chain checker object will be leaked.

2. Use of macros that goto cleanup label in the blocks of code that follows "cleanup" label. Example is usage of PKIX_CHECK
at the end of the function pkix_BuildForwardDepthFirstSearch.

3. Object is lost if the code controlled by PKIX_CHECK macro(for example) executed after a function that create an object but before a reference to an object saved for future references. Example of such code can be found in pkix_build.c:

    after PKIX_PL_AIAMgr_Create is called at line 4034, the created object reference is not saved until line 4102:

    4102             state->buildConstants.aiaMgr = buildConstants.aiaMgr;

Line 4055 where buildConstants.aiaMgr get set to an address in aiaMgr should not be considered since buildConstants is allocated on the stack and will not be destructed.
Comment 12 Nelson Bolyard (seldom reads bugmail) 2007-11-06 18:05:25 PST
Comment on attachment 287227 [details] [diff] [review]
Patch v2.

This megapatch contains separable pieces.  I will provide separate reviews
for the pieces.  Some of the pieces will get r+ here, and can be checked
in immediately (please do).  Other parts will get r-, and others await
further review.  Overall, the megapatch gets r-

r+ for the changes to the following files:
+ mozilla/security/nss/lib/libpkix/pkix/crlsel/pkix_crlselector.c
+ lib/libpkix/pkix_pl_nss/module/pkix_pl_ldapcertstore.c  
+ lib/libpkix/pkix_pl_nss/module/pkix_pl_ldapdefaultclient.c
+ lib/libpkix/pkix_pl_nss/module/pkix_pl_pk11certstore.c
+ lib/libpkix/pkix_pl_nss/system/pkix_pl_common.c
The changes to these files can be committed immediately.  Please do.

I will send the rest of the review comments in email.
Comment 13 Alexei Volkov 2007-11-08 15:47:03 PST
Created attachment 287915 [details] [diff] [review]
leaks in PKIX_ProcessingParams. v1
Comment 14 Alexei Volkov 2007-11-09 16:05:08 PST
Created attachment 288067 [details] [diff] [review]
pkix_pl_hashtable.c chages
Comment 15 Damon Sicore (:damons) 2007-11-09 16:27:09 PST
Leaks are bad.  +'ing.
Comment 16 Alexei Volkov 2007-11-09 17:51:04 PST
Created attachment 288082 [details] [diff] [review]
Localize the scope of PKIX_MUTEX_LOCK/UNLOCK macros

Previous patch was obsoleted due to problems in the nss build.
Comment 17 Nelson Bolyard (seldom reads bugmail) 2007-11-09 18:00:22 PST
Comment on attachment 287915 [details] [diff] [review]
leaks in PKIX_ProcessingParams. v1

This patch modifies 4 files.  The modifications are independent.
r+ for three of the four files.  r+ for these files:

lib/libpkix/pkix/params/pkix_buildparams.c
lib/libpkix/pkix/params/pkix_trustanchor.c
lib/libpkix/pkix/params/pkix_valparams.c

Let's discuss the change to the remaining file
lib/libpkix/pkix/params/pkix_procparams.c
Comment 18 Nelson Bolyard (seldom reads bugmail) 2007-11-09 18:43:04 PST
Comment on attachment 287915 [details] [diff] [review]
leaks in PKIX_ProcessingParams. v1

I have previously given r+ for 3 of the files in this patch.
But now I must give r- for the changes to the file 
libpkix/pkix/params/pkix_procparams.c

There is one problem that occurs in most of the functions modified
in this patch.  It occurs after the label "cleanup", in code that 
looks like this:

>-        if (PKIX_ERROR_RECEIVED){
>-                PKIX_DECREF(date);
>+        if (PKIX_ERROR_RECEIVED) {
>+            PKIX_DECREF(params->date);
>         }

The problem is that, after label cleanup, the variable params can be
NULL, because there are error paths that goto cleanup if params is NULL.
So all the calls to PKIX_DECREF that pass a value in the params struct
must first check to see that params is not NULL, e.g. 

>+        if (PKIX_ERROR_RECEIVED && params) {
>+            PKIX_DECREF(params->date);
>         }
Comment 19 Slavomir Katuscak 2007-11-12 07:12:09 PST
Hi Alexei, I'm checking the history of changes in Tindebox + resolving failures in nightly tests. Seems that your patch from 2007-11-09 17:45 related to this bug causes many failures (2664/3558 Passed on most platforms). Please fix this ASAP. 
Comment 20 Alexei Volkov 2007-11-12 12:16:46 PST
Slavo, integration of changes to pkix_pl_pk11certstore.c introduce a regression: arena was deallocated twice in one of the function pkix_pl_Pk11CertStore_CertQuery. Fix is integrated.
Comment 21 Alexei Volkov 2007-11-12 12:27:42 PST
Created attachment 288359 [details] [diff] [review]
leaks in PKIX_ProcessingParams. v2

Nice catch, Nelson. This patch fixes the problem.
Comment 22 Nelson Bolyard (seldom reads bugmail) 2007-11-12 18:37:05 PST
Comment on attachment 288082 [details] [diff] [review]
Localize the scope of PKIX_MUTEX_LOCK/UNLOCK macros

r=nelson
Comment 23 Nelson Bolyard (seldom reads bugmail) 2007-11-12 20:18:45 PST
Comment on attachment 288359 [details] [diff] [review]
leaks in PKIX_ProcessingParams. v2

r=nelson
Comment 24 Alexei Volkov 2007-11-13 10:26:00 PST
Attachment 288359 [details] [diff] has been committed.
Comment 25 Alexei Volkov 2007-11-14 10:26:40 PST
Created attachment 288699 [details] [diff] [review]
Main patch v3
Comment 26 Alexei Volkov 2007-11-14 11:48:00 PST
patch 288082 has been integrated.
Comment 27 Nelson Bolyard (seldom reads bugmail) 2007-11-15 16:49:30 PST
Comment on attachment 288699 [details] [diff] [review]
Main patch v3

r-.  I gave the review comments to Alexei over the phone.
Comment 28 Alexei Volkov 2007-11-15 17:06:40 PST
Created attachment 288930 [details] [diff] [review]
Main patch v4
Comment 29 Nelson Bolyard (seldom reads bugmail) 2007-11-15 17:39:58 PST
Comment on attachment 288930 [details] [diff] [review]
Main patch v4

Alexei, Your latest patch includes this:

>+/* This local error check macro */
>+#define ERROR_CHECK(errCode) \
>+    if (pkixErrorResult) { \
>+        pkixTempErrorReceived = PKIX_TRUE; \
>+        pkixErrorClass = pkixErrorResult->errClass; \
>+        if (pkixErrorClass == PKIX_FATAL_ERROR) { \
>+            goto cleanup; \
>+        } \
>+        if (verifyNode) { \
>+            PKIX_INCREF(pkixErrorResult); \
>+            verifyNode->error = pkixErrorResult; \
>+            pkixErrorCode = errCode; \
>+        } \
>+        goto cleanup; \
>+    }

I think that last part should be:

>+        if (verifyNode) { \
>+            PKIX_INCREF(pkixErrorResult); \
>+            verifyNode->error = pkixErrorResult; \
>+        } \
>+        pkixErrorCode = errCode; \
>+        pkixErrorMsg = PKIX_ErrorText[errCode];
>+        goto cleanup; \
>+    }

Do you agree?
Comment 30 Alexei Volkov 2007-11-16 10:51:25 PST
I agree. An error code should be set regardless of verifyNode value.
Comment 31 Nelson Bolyard (seldom reads bugmail) 2007-11-16 14:53:47 PST
Comment on attachment 288930 [details] [diff] [review]
Main patch v4

r=nelson, provided that you make the change we discussed in comment 29.

I have some concerns with pkix_BuildForwardDepthFirstSearch(), but 
I'm not going to hold up this patch for those things, because I know 
there's more work to be done in this function later.
Comment 32 Alexei Volkov 2007-11-20 14:43:17 PST
Attachment 288930 [details] [diff] has been integrated.
Comment 33 Reed Loden [:reed] (use needinfo?) 2007-12-03 21:28:00 PST
Did this get checked-in? If so, can the bug be resolved as fixed?
Comment 34 Alexei Volkov 2007-12-04 16:56:28 PST
Created attachment 291570 [details] [diff] [review]
One more place where PKIX_Error is leaked
Comment 35 Nelson Bolyard (seldom reads bugmail) 2007-12-04 18:56:16 PST
Comment on attachment 291570 [details] [diff] [review]
One more place where PKIX_Error is leaked

r+ because the new code seems correct.  But please explain to me the path
through pkix_ForwardBuilderState_Create that you're trying to correct.
Was it a leak?  Or a double-free?
Comment 36 Alexei Volkov 2007-12-05 09:47:11 PST
It was a leak of PKIX_Error object. pkixTempResult is not checked before exit and its value is overridden with a first macro code.
Comment 37 Alexei Volkov 2007-12-05 10:02:51 PST
attachment 291570 [details] [diff] [review] in on the trunk.
Comment 38 Alexei Volkov 2007-12-05 11:59:37 PST
PKIX_Error object will be leaked only if pkix_VerifyNode_Create failed and return an error. The execution path: 
->PKIX_BuildChain
    ->pkix_Build_InitiateBuildChain
          ->pkix_VerifyNode_Create
          <-returns an error
       define new FATAL error and leak the error
       that was returned by pkix_VerifyNode_Create.
Comment 39 Alexei Volkov 2007-12-05 12:05:07 PST
Created attachment 291719 [details] [diff] [review]
Fix in pkix_VerifyNode_Create

pkix_VerifyNode_Create will leak PKIX_VerifyNode object pointed by "node" if error occurred after PKIX_PL_Object_Alloc call.
Comment 40 Nelson Bolyard (seldom reads bugmail) 2007-12-05 13:14:48 PST
Alexei, your explanation in comment 38 makes no mention of function
pkix_ForwardBuilderState_Create.  Please explain the leaking path of 
execution in that function.
Comment 41 Nelson Bolyard (seldom reads bugmail) 2007-12-05 13:20:15 PST
Comment on attachment 291719 [details] [diff] [review]
Fix in pkix_VerifyNode_Create

r=nelson
Comment 42 Alexei Volkov 2007-12-05 14:04:11 PST
(In reply to comment #40)
> Alexei, your explanation in comment 38 makes no mention of function
> pkix_ForwardBuilderState_Create.  Please explain the leaking path of 
> execution in that function.
> 

Oh, I see why you have questions about pkix_ForwardBuilderState_Create. The fix to this function make it not to leak created state in case of error, but has nothing to do with a leak of PKIX_Error object in function pkix_Build_InitiateBuildChain. Sorry for confusion.
Comment 43 Alexei Volkov 2007-12-05 14:21:43 PST
Created attachment 291756 [details] [diff] [review]
Object leaks in <object>_Create functions.

Fix object leaks that could happen in <object>_Create function due to occurred error conditions.
Comment 44 Alexei Volkov 2007-12-05 14:27:30 PST
attachment 291719 [details] [diff] [review] has been committed.
Comment 45 Alexei Volkov 2007-12-05 15:43:43 PST
(In reply to comment #43)
> Created an attachment (id=291756) [details]
> Object leaks in <object>_Create functions.
> 
> Fix object leaks that could happen in <object>_Create function due to occurred
> error conditions.
A bug was found during review of pkix_pl_CRL_CreateToList. Opening new bug 407064 since the bug is unrelated to this patch. 
Comment 46 Nelson Bolyard (seldom reads bugmail) 2007-12-05 18:35:18 PST
Comment on attachment 291756 [details] [diff] [review]
Object leaks in <object>_Create functions.

r=nelson
Comment 47 Alexei Volkov 2007-12-17 15:58:19 PST
Created attachment 293596 [details] [diff] [review]
One more set of objects leaks.

This patch fixes the last set of known object leaks.

What remains are memory leaks in pkix_pl_cert.c what will be fixed by consequent patch.
Comment 48 Nelson Bolyard (seldom reads bugmail) 2007-12-18 12:50:03 PST
Comment on attachment 293596 [details] [diff] [review]
One more set of objects leaks.

r+=nelson, with one caveat: I can't vouch for the correctness of function 
pkix_BuildForwardDepthFirstSearch.
It's too big, too long, with too many local variables for mere mortals to 
analyze.  Let's put on our list of lower priority cleanup work to decompose
that monster into smaller functions whose correctness is verifiable.
Comment 49 Alexei Volkov 2007-12-18 13:00:29 PST
I've opened a bug 402733 to trace leaks in pkix_build.c because of it's couple monstrous functions, pkix_BuildForwardDepthFirstSearch in particular. So the bug should be used to clean up pkix_build.c code.

Comment 50 Alexei Volkov 2007-12-18 16:42:37 PST
patch in attachment 293596 [details] [diff] [review] has been committed.
Comment 51 Alexei Volkov 2008-01-25 17:40:49 PST
Created attachment 299354 [details] [diff] [review]
Leak in pkix_pl_OcspRequest_Create
Comment 52 Nelson Bolyard (seldom reads bugmail) 2008-01-28 19:04:17 PST
Comment on attachment 299354 [details] [diff] [review]
Leak in pkix_pl_OcspRequest_Create

r=nelson
Comment 53 Alexei Volkov 2008-01-28 23:19:43 PST
attachment 299354 [details] [diff] [review] has been integrated.
Comment 54 Kai Engert (:kaie) 2008-01-29 18:24:39 PST
Comment on attachment 299354 [details] [diff] [review]
Leak in pkix_pl_OcspRequest_Create

Sorry, but I think this patch causes a crash. More info in the next comment (this patch edit field ist just too small)
Comment 55 Kai Engert (:kaie) 2008-01-29 18:27:21 PST
I was testing around EV and OCSP.
My test case was simple: Visit the paypal web site.

Using Firefox trunk + NSS Beta 1
 => works fine

Using Firefox trunk + NSS trunk 
 => crashes during the cleanup of pkix ocsp request

Using Firefox trunk + NSS trunk + attachment 299354 [details] [diff] [review] backed out
 => works fine

Please let me know if you would like a separate bug report for this, but I propose to simply back this out?

Comment 56 Nelson Bolyard (seldom reads bugmail) 2008-01-29 18:53:33 PST
In reply to comment 54:
Kai, do you know about the form text resizer bookmarklet? 
It will let you resize any form text entry box.  Very handy.

In reply to comment 55:
Please file a separate bug.  I believe this patch is correct.
The crash is due another bug, a latent bug that was formerly masked by the 
memory leak.  Correcting the memory leak reveals that other latent bug.

There's a big block comment in the function being patched here.  It warns
of a double free, unless the coder is "careful".  If your crash is in free,
or malloc, then I suspect the problem is a double free of that CERTCertificate 
object.  The writer of that comment apparently did not know how to "addref" 
an NSS CERTCertificate object.  That is done by calling CERT_DupCertificate.  
Comment 57 Jonas Sicking (:sicking) 2008-02-13 01:24:30 PST
What's the status here? Is this bug fixed or was the patch backed out. Note that if multiple code-paths think they own the memory and free it, there's a good chance of exploitable crashes.
Comment 58 Kai Engert (:kaie) 2008-02-14 02:20:07 PST
I can no longer reproduce the crash.

In other words, the latest trunk appears to work fine.
Comment 59 Jonas Sicking (:sicking) 2008-02-14 13:50:34 PST
So the patch was landed? Can this bug be marked fixed? We no longer leak?
Comment 60 Kai Engert (:kaie) 2008-02-14 14:19:08 PST
(In reply to comment #59)
> So the patch was landed? Can this bug be marked fixed? We no longer leak?

Yes, Alexei had checked this patch in, see comment 53.

I don't know what fixed the crash, maybe some other check in fixed it. As Nelson had said in comment 56 the cause of the crash was probably elsewhere.


I don't know if this bug is ready to be marked fixed.
Comment 61 Nelson Bolyard (seldom reads bugmail) 2008-02-14 14:23:47 PST
Jonas, A great many leaks have been fixed, and the fixes for those leaks are
now present in the drop of NSS being used presently in trunk browser builds.
We are not yet prepared to say that ALL leaks are fixed.  
But it may be that all (or the vast majority) of the leaks that were 
experienced by the browser are fixed.  Only a browser person can determine 
that.  If you conclude that the leaks in libPKIX are no longer a major issue
for the browser, then feel free to remove the blocking flag from this bug.
The NSS team did not set it, and the NSS team will not clear it.
Comment 62 Alexei Volkov 2008-02-26 11:02:52 PST
Created attachment 305807 [details] [diff] [review]
Object leak test(patch v1)

Code and script to check object/mem leaks in libpkix.
Some object leak fixes.
Comment 63 Alexei Volkov 2008-02-26 13:47:03 PST
Comment on attachment 305807 [details] [diff] [review]
Object leak test(patch v1)

Slavo, please review the part of the patch which modifies test scripts. In particular, I'm looking for review of the memory leak work I've done in libpkix.sh
Comment 64 Slavomir Katuscak 2008-02-27 06:30:56 PST
Alexei, I have some comments to your patch:
1. I don't understand why half of all.sh is removed. Is it only change you're using on your test machine ? This affects also other parts of testing and cleanup is missing there (should close HTML page).
2. libpkix_leak_test() - when extraOpt is set once, it's never cleaned up and is applied for all next while iterations, I see that you read data from vfychain_test.lst, but I don't see this file in CVS or in patch, so I can't check the content
3. libpkix_setup_db() - maybe using OUTPUT=`certutil ... 2>&1` would be faster then temporary file (in case you don't reach limit of the content size of shell variable, see bug 400103 comment 2), but this is not a critical
4. On some places you use checking if [ ${..} ], would be better to use -n, also is see some if [ ! -z ${..} ] in libpkix.sh.
Comment 65 Nelson Bolyard (seldom reads bugmail) 2008-02-27 17:10:04 PST
Comment on attachment 305807 [details] [diff] [review]
Object leak test(patch v1)

This patch apparently contains some unintended parts, so rather than review it,
I'll let you resubmit it with just the intended parts.
Please address Slavo's issues, except that I prefer the use of a temporary file
to relying on shell buffers being very large, so that point is OK as is.
Comment 66 Nelson Bolyard (seldom reads bugmail) 2008-02-29 10:46:13 PST
Comment on attachment 305807 [details] [diff] [review]
Object leak test(patch v1)

1. In source file lib/certhigh/certvfypkix.c Please move the declaration 
of this function:

> extern void
> cert_AddToVerifyLog(CERTVerifyLog *log, CERTCertificate *cert,
>                     unsigned long errorCode, unsigned int depth,
>                     void *arg);

out of this file and into the appropriate header file.  We really need to 
stamp out all extern function declarations in .c files!

2. In function cert_VerifyCertChainPkix, you added a label "start" near
the beginning, and then at the very end, before the final return rv,
you added a conditional goto that goes back to start.  Please replace
the label start with 
   do {
and replace the conditional goto with
   } while (errorGenerated);
Don't worry about the indentation.

BTW, PLEASE do not code any tests that read "== PR_TRUE".
Let's not write code that appears to have been written by newbies.
If they're booleans, just test them without comparison, e.g. rather than 
   if (foo == PR_TRUE)
code
   if (foo)

3.  Please, no use of stdin, stdout, stderr in libNSS.  So delete this:
>+    printf("Loop %d\n", memLeakLoopCount++);

4. This error is misspelled, and its meaning is unclear.
>+PKIX_ERRORENTRY(GENERATEDERRORRECEIVED, Recieve generated error,0),
                                             ^^
I have no idea what that means.  To what does "receive" refer?
Does that mean we received an error that was generated by something?  
Or that the act of receiving something generated an error?  

5. Here's another error code whose meaning is unclear.
PKIX_ERRORENTRY(NSSCERTIFICATEUSAGETOPKIXKUANDEKUFAILED,Nss certificate usage to pkix ku and eku failed,0),

6. Don't initialize external or static variables with zero values.
It's unnecessary because the c language guarantees all such variables are initialized to zero, and it increases the code file size.
Just leave them uninitialized.  Here are some examples, but please fix 
this throughout libPKIX code, starting with this patch.

>+PKIX_Boolean pkixInitInProgress = PKIX_FALSE;
>+PKIX_Boolean pkixShutdownInProgress = PKIX_FALSE;
>+PKIX_UInt32 stackPosition = 0;

7. In pkix_Throw, you added this line:
                      PKIX_INCREF(cause);
I don't understand why.  cause is not subsequently decref'ed.

9. MAJOR: New function pkix_ErrorGen_ProgressHash trashes the stack. 
The problem is this definition:

>+          PKIX_UInt32 *rvc = (PKIX_UInt32 *)&rv;

That needs to be an unsigned char pointer, not an int32 pointer.
Does this discovery invalidate any of the test results?

10.  regarding these lines:

> /* #define TEST_START_FN "pkix_BuildForwardDepthFirstSearch" */
> /* #define TEST_START_FN_STACK_POS 4 */
> #define TEST_START_FN "PKIX_BuildChain"
> #define TEST_START_FN_STACK_POS 2

a) please add comments explaining what these symbols are and do.
b) please if some sort of #if rather than commenting out those two
lines, or else just remove those two commented lines.

11. MAJOR: Your new definition of macro PKIX_STD_VARS really needs to be 
moved into a function.  We don't want all this code in every libPKIX
function.  But this issue can be fixed after this patch (or its successor)
is checked in.

12. MAJOR: In function pkix_pl_Cert_CreateToList, your patch appears to 
break this function.  If CERT_DecodeDERCertificate returns a cert, you 
toss it, but if it returns NULL, yo try to create a PKIX cert from NULL.

13. Your patch removes a call to PKIX_OBJECT_UNLOCK(lockedObject);
from PKIX_PL_Cert_GetSubjectPublicKey.  That doesn't seem right.
Let's discuss it.

14.  The reason for the changes in file 
lib/libpkix/pkix_pl_nss/system/pkix_pl_hashtable.c  are not apparent.
Please explain them to me, and then add a large comment to that file
explaining it.

15. Remove this comment:
  /* Turn this code on again after bug 397832 get fixed. */
because your patch "turns this code on again".

16. In function pkix_pl_Object_Destroy, you removed this line:
>-        objectHeader->magicHeader = 0;
why?  wasn't that there to detect references to destroyed objects?
Comment 67 Alexei Volkov 2008-02-29 14:08:56 PST
Thank for review, Slavo

> 1. I don't understand why half of all.sh is removed. Is it only change you're
I've attached a wrong file. Disregard.  

> 2. libpkix_leak_test() - when extraOpt is set once, it's never cleaned up and
> is applied for all next while iterations, I see that you read data from
> vfychain_test.lst, but I don't see this file in CVS or in patch, so I can't
> check the content
Good catch.

Comment 68 Alexei Volkov 2008-03-18 14:59:36 PDT
Created attachment 310352 [details] [diff] [review]
Object leak test(patch v2)

Changes according review comments.
Comment 69 Alexei Volkov 2008-03-18 15:00:47 PDT
Created attachment 310353 [details] [diff] [review]
Object leak test(patch v3)

Changes according review comments.
Comment 70 Nelson Bolyard (seldom reads bugmail) 2008-03-18 17:03:49 PDT
Comment on attachment 310353 [details] [diff] [review]
Object leak test(patch v3)

Alexei, please make a version of this patch that bugzilla can compare the patch v1.  The new patch should patch the same files as the old one, and the same file versions as the old one.  Any patches to any new files (not previously patched in the old patch) can be made into a separate patch.  
Alterantively, you can carry the old patch forward to current file versions, so that the two patches are comparable.
Comment 71 Alexei Volkov 2008-03-20 14:31:45 PDT
(In reply to comment #66)
> (From update of attachment 305807 [details] [diff] [review])
> 7. In pkix_Throw, you added this line:
>                       PKIX_INCREF(cause);
> I don't understand why.  cause is not subsequently decref'ed.
Code returns the cause as a newly created error. The count on reference
on the cause should be incremented. Otherwise we will get a dangled pointer
after exiting the pkix_Throw call.

> 9. MAJOR: New function pkix_ErrorGen_ProgressHash trashes the stack. 
Function is not used: removed.

> 12. MAJOR: In function pkix_pl_Cert_CreateToList, your patch appears to 
> break this function.  If CERT_DecodeDERCertificate returns a cert, you 
> toss it, but if it returns NULL, yo try to create a PKIX cert from NULL.
Nice catch. Thx!

> 13. Your patch removes a call to PKIX_OBJECT_UNLOCK(lockedObject);
> from PKIX_PL_Cert_GetSubjectPublicKey.  That doesn't seem right.
> Let's discuss it.
I've removed macro code and replace it with a function call. The
function calls PKIX_OBJECT_UNLOCK.

> 14.  The reason for the changes in file 
> lib/libpkix/pkix_pl_nss/system/pkix_pl_hashtable.c  are not apparent.
> Please explain them to me, and then add a large comment to that file
> explaining it.
Use of hashtable makes object leak testing impossible as it saves pointer
to cert,crl and cert chains. Disable ht functionality to the testing.
Comment 72 Alexei Volkov 2008-03-20 14:50:24 PDT
Created attachment 310842 [details] [diff] [review]
Object leak test(patch v4)

Patch that was made against 02/26/08 NSS code base.
Comment 73 Alexei Volkov 2008-03-20 14:55:35 PDT
Comment on attachment 310842 [details] [diff] [review]
Object leak test(patch v4)

This patch is only missing modification to mozilla/security/coreconf/config.mk file as following:
Index: coreconf/config.mk
===================================================================
RCS file: /cvsroot/mozilla/security/coreconf/config.mk,v
retrieving revision 1.22
diff -U 3 -p -r1.22 config.mk
--- coreconf/config.mk  29 Oct 2007 22:10:53 -0000      1.22
+++ coreconf/config.mk  20 Mar 2008 21:55:00 -0000
@@ -190,6 +190,10 @@ ifdef BUILD_LIBPKIX_TESTS
 DEFINES += -DBUILD_LIBPKIX_TESTS
 endif
 
+ifdef PKIX_OBJECT_LEAK_TEST
+DEFINES += -DPKIX_OBJECT_LEAK_TEST
+endif
+
Comment 74 Nelson Bolyard (seldom reads bugmail) 2008-03-20 17:16:47 PDT
Comment on attachment 310842 [details] [diff] [review]
Object leak test(patch v4)

Alexei, when I view the interdiff of this patch against the last patch I 
reviewed, which was patch v1, as seen in this URL
https://bugzilla.mozilla.org/attachment.cgi?oldid=305807&action=interdiff&newid=310842&headers=1
I see lots of problems.   Some lines are shown to be deleted that this patch 
actually adds.  Lines from .c files are shown as appearing in shell scripts.  
In short, I simply cannot rely on the interdiff output to tell me what really 
changed in this patch.  Please try again.
Comment 75 Alexei Volkov 2008-03-21 10:06:22 PDT
Created attachment 311000 [details] [diff] [review]
Object leak test(Patch v5)

Diff between v4 and v1 is messed up at the end of the representation: changes that were added into the v4 patch a shown as they were added to patch v1. But it easy to recognize the problem, since there is no dash line in between diff of the patches.
One more try...
Comment 76 Alexei Volkov 2008-03-21 10:13:07 PDT
Again, there are small number of changes(only two blocks at the and) that are incorrectly represented when you do diff of v5 and v1.

void cert_AddToVerifyLog

and

PKIX_DECREF(oid); 

are added to the patch version 5 and are not present at v1. But you can notice, that there is no dash line to the changes right side and so it is easy to spot.

The rest of the changes are shown correctly.
Comment 77 Nelson Bolyard (seldom reads bugmail) 2008-03-21 13:03:31 PDT
Also, the patch information from several different files is mixed together.
When I see that, I don't trust these interdiff results AT ALL.  Sorry.
This patch needs to be split into two parts, one part that patches ONLY 
the same files as patch v1, and a separate patch that patches any additional
files.  
Comment 78 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-03-21 13:13:41 PDT
Would it perhaps be easier for you to copy the relevant subset of your tree, apply v1 to one and vN to the other, and then do a diff locally?  Making Alexei work around bugzilla's silly interdiff limitations seems inefficient.
Comment 79 Alexei Volkov 2008-03-21 16:09:11 PDT
Created attachment 311069 [details] [diff] [review]
Object leak patch(Patch v6)
Comment 80 Alexei Volkov 2008-03-21 16:23:38 PDT
Created attachment 311077 [details] [diff] [review]
Object leak patch(Patch v7)
Comment 81 Alexei Volkov 2008-03-24 10:05:45 PDT
Created attachment 311397 [details] [diff] [review]
Object leak test(missing new files)
Comment 82 Nelson Bolyard (seldom reads bugmail) 2008-03-24 12:36:43 PDT
Created attachment 311420 [details] [diff] [review]
Object leak patch v8

To see why interdiff continued to complain that the patches v1 and v7 are
for different sets of files, I downloaded both patches and ran this command:

grep Index /tmp/397832v[17].txt | cut -f 2 -d ' ' | sort | uniq -c | grep 1

The output was 

   1 nss/lib/libpkix/pkix/top/pkix_build.c

So, I found the diff for that file in patch v7 and removed it, producing
patch v8.  

I then verified that the file version numbers in patches v1 and v8 were all
the same with this command:

grep '^---' /tmp/397832v[18].txt | cut -f 2-4 -d ':' | sort | uniq -c | \
grep '^   1'

Hopefully v1 and v8 will compare without any complaints about different 
versions of sets of files.
Comment 83 Nelson Bolyard (seldom reads bugmail) 2008-03-24 12:39:19 PDT
Created attachment 311421 [details] [diff] [review]
Patch v8 for additional files, not modified in v1

This patch contains the diffs for the files that were not included in v1
Comment 84 Nelson Bolyard (seldom reads bugmail) 2008-03-24 12:44:31 PDT
Comment on attachment 311421 [details] [diff] [review]
Patch v8 for additional files, not modified in v1

r+ for the changes in this small patch.
Comment 85 Alexei Volkov 2008-03-24 14:10:19 PDT
Comment on attachment 311420 [details] [diff] [review]
Object leak patch v8

thank you, Nelson. Diff between v1 and v8 works correctly now. Please review it.
Comment 86 Nelson Bolyard (seldom reads bugmail) 2008-03-24 15:23:18 PDT
Alexei, You're supposedly on vacation today!  If not, call me.  :)

I'm not done with this review, but here are some preliminary review comments.

1. This patch adds more "extern" declarations of variables and/or functions
in .c files, rather than putting those declarations in header files where 
they belong.  See certvfypkix.c  pkix_lifecycle.c
I'm not asking (in this bug) that all existing extern declarations in .c 
files be removed. That's the subject of bug 424857. But I'm asking that
this bug/patch not add any new extern declarations to .c files.

2. Is is true that all this leak testing is single-threaded testing?
I ask because there are several aspects of this new leak testing code that
are not safe or correct for multi-threaded operation.  MAYBE that's OK if 
the libPKIX leak testing is all single threaded, but otherwise it's not.

examples:
a) The global variables used to track some internal state of a single cert
verification (on a single thread) in pkix_tools.c.  If multiple threads 
attempt cert verification simultaneously, they will step on each other.  

b) The new functions pkix_pl_lifecycle_ObjectTableUpdate and pkix_pl_lifecycle_ObjectLeakCheck make a copy of the global per-class object counts at the beginning of a verification, and then check that the copied numbers against the global values again at the end of the verification to 
see if they have changed, indicating a leak.  Here again, if multiple threads
are running, these results will be incorrect.  

If we decided that this code will be OK anyway, because the leak-checking 
builds are supposed to be used ONLY in single-threaded testing, then I'd 
say we need to put some code into leak checking builds that detects that 
multiple threads are using libPKIX and aborts if so.  
Comment 87 Nelson Bolyard (seldom reads bugmail) 2008-03-24 15:41:26 PDT
In reply to comment #82, it occurs to me, in retrospect that 
    uniq -u
is simpler and potentially more accurate than either 
>   uniq -c | grep 1
or 
>   uniq -c | grep '^   1'
Comment 88 Alexei Volkov 2008-03-25 15:15:00 PDT
(In reply to comment #84)
> (From update of attachment 311421 [details] [diff] [review])
> r+ for the changes in this small patch.

The patch is integrated.
Comment 89 Alexei Volkov 2008-03-25 15:42:52 PDT
Created attachment 311680 [details] [diff] [review]
Object leak test(patch v9)

Address review comments:
  move exports into pkix_tool.h
  disallow object leak test code to be built for optimized libraries.
  prevent multi-threaded object leak test runs.
Comment 90 Nelson Bolyard (seldom reads bugmail) 2008-03-25 18:12:19 PDT
Comment on attachment 311680 [details] [diff] [review]
Object leak test(patch v9)

r=nelson
Comment 91 Alexei Volkov 2008-03-26 11:49:34 PDT
object leak test patch is integrated.
Comment 92 Alexei Volkov 2008-04-09 09:43:00 PDT
Created attachment 314614 [details] [diff] [review]
Two more leaks in libpkix
Comment 93 Nelson Bolyard (seldom reads bugmail) 2008-04-09 13:19:05 PDT
Comment on attachment 314614 [details] [diff] [review]
Two more leaks in libpkix

r+
Comment 94 Alexei Volkov 2008-04-14 14:23:28 PDT
Created attachment 315629 [details] [diff] [review]
A few modifications to obj leak test code needed for tinderbox
Comment 95 Alexei Volkov 2008-04-14 14:25:58 PDT
Comment on attachment 315629 [details] [diff] [review]
A few modifications to obj leak test code needed for tinderbox

stackErrorCodes array is added to printout error descriptions that caused object leak.
Added -f "${R_PWFILE}" argument to certutil to provide db password.
Comment 96 Nelson Bolyard (seldom reads bugmail) 2008-04-14 17:05:43 PDT
Comment on attachment 315629 [details] [diff] [review]
A few modifications to obj leak test code needed for tinderbox

This patch changes 3 files.  I'm giving r+ for the changes to the
files tests/libpkix/libpkix.sh and lib/libpkix/pkix/util/pkix_tools.h

I want to go over the changes to file lib/certhigh/certvfypkix.c 
with you in person, Alexei, before giving a final review to them.
Comment 97 Nelson Bolyard (seldom reads bugmail) 2008-04-15 15:15:52 PDT
Comment on attachment 315629 [details] [diff] [review]
A few modifications to obj leak test code needed for tinderbox

r=nelson
Comment 98 Alexei Volkov 2008-04-15 15:41:34 PDT
Patch 315629 is integrated.
Comment 99 Alexei Volkov 2008-04-17 10:27:30 PDT
All patches are integrated. Object leak tinderbox runs without indicating any problems. Closing the bug..
Comment 100 Alexei Volkov 2008-04-18 15:51:55 PDT
New leak is found in libpkix. Some modification are also need to object leak test to run tests through CERT_PKIXVerifyCert.
Comment 101 Alexei Volkov 2008-04-18 15:53:41 PDT
Created attachment 316504 [details] [diff] [review]
Leak in pkix_PolicyChecker_MakeMutableCopy
Comment 102 Alexei Volkov 2008-04-18 15:57:48 PDT
Created attachment 316505 [details] [diff] [review]
Additional modification to object leak test

Allows to printout a stack at which the error was generated that cause an object leak.
Adding paypal EE cert to be a part of the leak test.
Comment 103 Nelson Bolyard (seldom reads bugmail) 2008-04-18 18:32:04 PDT
Comment on attachment 316505 [details] [diff] [review]
Additional modification to object leak test

Alexei, let's go over this patch together at your soonest convenience.
Comment 104 Brendan Eich [:brendan] 2008-04-19 00:04:26 PDT
New bug for new patch is best. REOPEN and more patching should mean the patch before the bug was closed was backed out.

/be
Comment 105 Alexei Volkov 2008-04-21 14:28:30 PDT
Comment on attachment 316505 [details] [diff] [review]
Additional modification to object leak test

Will use bug 430135 for object leak improvements.
Comment 106 Alexei Volkov 2008-04-21 14:28:45 PDT
316505 is integrated. Closing
Comment 107 Kai Engert (:kaie) 2008-05-21 09:56:39 PDT
Alexei, when you made this check in to this bug:

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=alexei.volkov.bugs%25sun.com&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2008-04-08&maxdate=2008-04-10&cvsroot=%2Fcvsroot

you have done unrelated changes, reverted some of my tracing code.

In particular, you changed PR_LOGGING back to DEBUG_kaie and removed an PKIX_ENTER call.


@Christophe: This is the cause of the build error you saw.
Comment 108 Nelson Bolyard (seldom reads bugmail) 2008-05-21 12:46:35 PDT
Kai, I agree that the checkin did not match the patch that was reviewed,
which was attachment 314614 [details] [diff] [review].

Alexei is on vacation now until sometime in June, so if this needs to be
fixed before then, someone else must write a patch to undo the unreviewed
changes. 
Comment 109 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-05-22 05:50:41 PDT
Seems like a job for the reviewer, since it would be silly to escalate this to Alexei's CVS voucher.
Comment 110 Nelson Bolyard (seldom reads bugmail) 2008-05-22 15:01:29 PDT
Created attachment 322177 [details] [diff] [review]
(wrong patch)

I created a patch to undo the changes that were not part of the originally
reviewed patch, by backing out the committed changes, and then applying the 
original patch.  That patch is attached for reference, but I am giving it an
r- because I do not want to see it committed. In studying this patch, I see 
two differences:

1) some #ifdefs were changed from 
  #ifdef PR_LOGGING 
to 
  #ifdef DEBUG_kaie 

2) A call to PKIX_ENTER was removed from function  pkix_trace_dump_cert.

I understand the reason for the first of those changes.  It solved a real
problem.  I recall discussing it with several people at the time. 
It is possible that I was asked for and gave a verbal approval to make that change, as a temporary solution to the problem.  I do not recall clearly now, 
but had I been asked, I would have given that approval, and today I am willing
to retroactively approve that already-committed change for that purpose.  

The ifdef change was done to stop the very verbose logging output that was 
appearing in all our test runs and tinderbox runs, and was swelling the 
output log files considerably.  Other NSS developers found that they had 
difficulty finding the output they wanted in the midst of all that extra 
debug output.  Also, they didn't understand all that output. 
Changing that ifdef caused that debug output to cease in Tinderbox run logs, 
and in the test runs performed by the other NSS developers.  

The real issue there, IMO, is that that verbose debug output is enabled by
the presence of an environment variable that has another primary purpose, 
and is used in all our testing.  That environment variable was overloaded.
In libpkix/pkix_pl_nss/system/pkix_pl_lifecycle.c function PKIX_PL_Initialize 
we see:
171         if (PR_GetEnv("NSS_STRICT_SHUTDOWN")) {
172             pkixLog = PR_NewLogModule("pkix");
173         }

I would be very willing to undo the changes to those ifdefs at the same
time as the above environment variable is changed to some other, such as 
PKIX_VERBOSE_TRACE or something similar.  If someone would prepare a patch
that combines the one attached to this comment with a change to that 
environment variable, I would give it serious consideration.

The second change seen in the attached patch, the removal of the call to 
PKIX_ENTER in function pkix_trace_dump_cert, may have a similar technical 
justification.  I suspect that it was causing other PKIX logging and/or pkix 
error stack handling to receive lots of extra noise, even in the case where 
NSS_STRICT_SHUTDOWN was not defined.  Whatever the reason, I do not think
it presents a problem that needs urgent attention, and I would prefer to 
have Alexei tell us about it when he returns.  However, I am open to being
persuaded that this does need urgent attention, if a good reason can be given.
Comment 111 Nelson Bolyard (seldom reads bugmail) 2008-05-22 15:04:46 PDT
Created attachment 322178 [details] [diff] [review]
The REAL patch to undo unapproved changes (not for immediate checkin)

Sorry, that last attachment was just a copy of the previously reviewed patch.
This is the patch file that I made.
Comment 112 Kai Engert (:kaie) 2008-05-27 05:58:44 PDT
(In reply to comment #110)
> The ifdef change was done to stop the very verbose logging output that was 
> appearing in all our test runs and tinderbox runs, and was swelling the 
> output log files considerably.

I do not remember having seen this.
Was there a bug filed?
It would have been good to cc me (I hope I didn't miss it).


> Other NSS developers found that they had 
> difficulty finding the output they wanted in the midst of all that extra 
> debug output.  Also, they didn't understand all that output. 
> Changing that ifdef caused that debug output to cease in Tinderbox run logs, 
> and in the test runs performed by the other NSS developers.  
> 
> The real issue there, IMO, is that that verbose debug output is enabled by
> the presence of an environment variable that has another primary purpose, 
> and is used in all our testing.  That environment variable was overloaded.
> In libpkix/pkix_pl_nss/system/pkix_pl_lifecycle.c function PKIX_PL_Initialize 
> we see:
> 171         if (PR_GetEnv("NSS_STRICT_SHUTDOWN")) {
> 172             pkixLog = PR_NewLogModule("pkix");
> 173         }
> 
> I would be very willing to undo the changes to those ifdefs at the same
> time as the above environment variable is changed to some other, such as 
> PKIX_VERBOSE_TRACE or something similar.  If someone would prepare a patch
> that combines the one attached to this comment with a change to that 
> environment variable, I would give it serious consideration.

I'm ok with the idea to use a new env var.

So, are we ok to keep having this wrapped in PR_LOGGING, assuming we switch to a separate env var?


> The second change seen in the attached patch, the removal of the call to 
> PKIX_ENTER in function pkix_trace_dump_cert, may have a similar technical 
> justification.  I suspect that it was causing other PKIX logging and/or pkix 
> error stack handling to receive lots of extra noise, even in the case where 
> NSS_STRICT_SHUTDOWN was not defined.

I don't know what the reason was, but without this PKIX_ENTER, the code does not compile at all.


> Whatever the reason, I do not think
> it presents a problem that needs urgent attention, and I would prefer to 
> have Alexei tell us about it when he returns.

Not urgent unless you want to compile it (as Christophe recently wanted to).

Why would it be bad to add an PKIX_ENTER statement, if this is wrapped inside our new debugging variable anyway? I suspect you have missed that it is wrapped.
Comment 113 Kai Engert (:kaie) 2008-05-27 06:15:21 PDT
Created attachment 322633 [details] [diff] [review]
Patch v42: Revert and use env var PKIX_VERBOSE_TRACE

Nelson, this is your patch, I'm giving r=kaie to your changes.
This adds your proposed environment variable.
Comment 114 Nelson Bolyard (seldom reads bugmail) 2008-05-27 09:57:10 PDT
Comment on attachment 322633 [details] [diff] [review]
Patch v42: Revert and use env var PKIX_VERBOSE_TRACE

Kai, I may be mistaken, but I think that any function that calls PKIX_ENTER 
must also return by calling PKIX_RETURN, and must be a function that returns 
a PKIX_ERROR * (a ptr to a PKIX_ERROR).  I think that, if this rule is not 
followed, the error handling code may have problems.   

I suspect that what happened was this:
First, Alexei removed the PKIX_ENTER, since this function doesn't generally 
follow the PKIX library API conventions, and causes the error handling 
problems.  Then he saw that it didn't compile, so he changed the ifdef to 
remove it, rather than try to redesign it.  

I have no objection, in principle, to re-enabling this code in debug builds
provided that it is conditioned on an environment variable that is not used
in our QA test runs (including Tinderbox).  I'm giving this review request
to Alexei, who can tell us more about the requirements for PKIX_ENTER.
Comment 115 Nelson Bolyard (seldom reads bugmail) 2008-06-02 16:21:48 PDT
Comment on attachment 322633 [details] [diff] [review]
Patch v42: Revert and use env var PKIX_VERBOSE_TRACE

I was right that, without the PKIX_RETURN macro, function pkix_trace_dump_cert will leak any PKIX_ERRORs that are returned by 
the other libPKIX functions it calls.  Also, with the PKIX_ENTER macro 
in place, every call to this function will log the fact that this 
function was called.  So, I have filed bug 436949 to track the work to 
fix function pkix_trace_dump_cert and restore it (change the ifdef).  

This bug (Bug 397832) was fixed in 3.12.0.  Bug 436949 will be fixed
in a subsequent release.  Further discussion of function 
pkix_trace_dump_cert should happen in that bug.

Note You need to log in before you can comment on or make changes to this bug.