Closed Bug 428038 Opened 12 years ago Closed 3 years ago

Crash destroying cert in [@ CERT_DestroyCertificate - cert_pkixDestroyValOutParam]

Categories

(NSS :: Libraries, defect, P1)

x86
Windows Vista
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME
3.12.2

People

(Reporter: marcia, Assigned: alvolkov.bgs)

References

Details

(Whiteboard: PKIX)

Crash Data

Attachments

(1 file, 3 obsolete files)

Seen while updating nightlies - I was updating to Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9pre) Gecko/2008040907 Minefield/3.0pre. I have a Master Password set for this profile

STR:
1. With about 20 tabs open, I was updating my build. I had several downloads in progress and some that had completed.
2. After the browser restarted from software update, I get download dialogs asking me if I want to save the various files. I hit cancel. The browser became unresponsive. I hit cancel again and I crashed. Here is the breakpad URL:
http://crash-stats.mozilla.com/report/index/55601433-0656-11dd-a1e2-001321b13766

I can list some of the sites that were open at the time of the crash if that is useful.
0  nssCertificate_Destroy  	pki/certificate.c:128
1  CERT_DestroyCertificate 	certdb/stanpcertdb.c:755
2  cert_pkixDestroyValOutParam 	certhigh/certvfypkix.c:1787
3  CERT_PKIXVerifyCert 	        certhigh/certvfypkix.c:2226
4  xul.dll@0x2e1219 	
5  nsNSSCertificate::getValidEVOidTag 	nsIdentityChecking.cpp:957
6  nsNSSCertificate::GetIsExtendedValidation nsIdentityChecking.cpp:983
7  AuthCertificateCallback 	nsNSSCallbacks.cpp:928
8  ssl3_HandleCertificate 	ssl/ssl3con.c:7260
9  ssl3_HandleHandshakeMessage 	ssl/ssl3con.c:7938
10 ssl3_HandleHandshake 	ssl/ssl3con.c:8062
Assignee: nobody → alexei.volkov.bugs
Priority: -- → P1
Summary: Crash in nssCertificate_Destroy → Crash destroying cert in cert_pkixDestroyValOutParam
Whiteboard: PKIX
Target Milestone: --- → 3.12
A quite possible reason for the failure is that output parameters were not initialized with NULL. PSM does not do it because the definition of API does require it. On the other hand, CERT_PKIXVerifyCert does not do either before starting to assign output pointers. 

Application may request a validation log to be return. But it needs to initialize the log pointer with non-zero value. Similarly, I was expecting application to initialize requested output parameter pointers. 

Which side should be responsible for zeroing these pointers?
CC'ing Bob and Nelson for opinions, whether PSM or NSS is responsible for initializing output variables to zero/null.

I'm ok to do it in PSM, but I had expected that NSS would do it.

In my opinion, PSM will not look at the output variables unless CERT_PKIXVerifyCert returns with success.

From the stack in comment 1 we can see that the crash happens before the function returns, meaning, NSS accesses the variables that are described in the API as pure output parameters.

If NSS depends on output parameters to be initialized, then I think the API should clearly mention it in its interface comments.

But even if we add it, I think it's not "intuitive" that output parameters must be initialized by the caller, and I expect that future programmers might make the same mistake.

I propose that NSS initializes all parameters that are declared as "output only" immediately on entering the NSS function, so that NSS can assume a safe state.
I suspect there has been some confusion about whether output parameters that
are pointers are variables that CERT_PKIXVerifyCert is supposed to fill in 
with pointers that it supplies, or are pointers to memory locations that 
CERT_PKIXVerifyCert is supposed to fill in.  The very term "output parameter"
usually signifies a pointer to a location that is filled in by the called 
function.  But in the case of this function, where arguments are passed in 
an array of non-const structs, I think that output parameters, including 
pointers, are values that NSS is expected to fill in directly.

The API definition should be absolutely clear about that issue.  If the 
output parameter's pointer value itself is a value that CERT_PKIXVerifyCert
is expected to fill in, the API must say that clearly.

Then, once that is clarified, I agree that CERT_PKIXVerifyCert itself must
initialize any output values (values that CERT_PKIXVerifyCert is expected 
to fill in) upon which it is later going to depend.  

Personally, I have always believed that the best strategy is not to modify 
output parameters until you're sure that you're going to really do the output.
That is, I would prefer that the output parameter never be filled in and then
subsequently destroyed.  I would prefer that it be filled in AFTER it is clear
that the operation is going to succeed in its entirety.  That may mean keeping
a local copy of the output parameters private to the function until it is all
done, then copying out the value of those output parameters to the caller's
copy when the operation is done.  But, having said that, I do not require 
that behavior for this function.
(In reply to comment #4)
> I suspect there has been some confusion about whether output parameters that
> are pointers are variables that CERT_PKIXVerifyCert is supposed to fill in 
> with pointers that it supplies, or are pointers to memory locations that 
> CERT_PKIXVerifyCert is supposed to fill in.  

I think this question is not being asked.

If the callers wants output parameters, then the caller is required to pass in CERTValOutParam*, an array that has one entry for each desired output parameter.

To me it appears obvious that each array entry must specify the desired type of output parameter.

To me it appears obvious, that the provided array is only able to carry simple types (int) or pointers.

To me it appears obvious, that the objects returned by NSS can be complex object, like lists.

I think it should be obvious to expect that NSS should allocate any output parameters that NSS places into the output parameter struture.

(If the caller pre-allocates output parameters, it might complicate the life for the implementation used by NSS, as it might have to copy stuff over. I think it's really simpler if NSS is responsible for allocating stuff and returning it.)


> The very term "output parameter"
> usually signifies a pointer to a location that is filled in by the called 
> function.  

If the function is expected to fill in stuff, it must be a pointer to a pointer.

You are talking about functions like
  give_me_a_result_pointer(char **output);

Where output is the location of a pointer that can obtain the result.


In our example we have 
  give_me_a_result_pointer(complex_type *output);

Where again, output is the location of something that can obtain the result. Only difference, it has multiple locations that can obtain results.


> But in the case of this function, where arguments are passed in 
> an array of non-const structs, I think that output parameters, including 
> pointers, are values that NSS is expected to fill in directly.

I don't see much difference.

But I agree with you. I'd just say, in both scenarios NSS is expected to fill in.


> The API definition should be absolutely clear about that issue.  If the 
> output parameter's pointer value itself is a value that CERT_PKIXVerifyCert
> is expected to fill in, the API must say that clearly.

I think it's obvious that that's not the case, because we don't expect a single pointer as a result. It's a kind of input parameter as well, because the caller provides a list of desired output parameter types.


> I agree that CERT_PKIXVerifyCert itself must
> initialize any output values (values that CERT_PKIXVerifyCert is expected 
> to fill in) upon which it is later going to depend.  

Thanks Nelson. 
This is the answer to our question.


> Personally, I have always believed that the best strategy is not to modify 
> output parameters until you're sure that you're going to really do the output.
> That is, I would prefer that the output parameter never be filled in and then
> subsequently destroyed.  I would prefer that it be filled in AFTER it is clear
> that the operation is going to succeed in its entirety.  That may mean keeping
> a local copy of the output parameters private to the function until it is all
> done, then copying out the value of those output parameters to the caller's
> copy when the operation is done.  But, having said that, I do not require 
> that behavior for this function.

What you describe here is a nice excursion to avoiding bad output parameters on failure exits, but I agree we don't need to worry about that right now.
Alexei or Nelson, can you please review?
Attachment #314769 - Flags: superreview?(nelson)
Attachment #314769 - Flags: review?(alexei.volkov.bugs)
Comment on attachment 314769 [details] [diff] [review]
Patch Alternative A, v1 - zero out all output params


>+void
>+cert_pkixInitValOutParam(CERTValOutParam *params)
>+{
>+    CERTValOutParam *i;
>+
>+    if (!params)
>+        return;
>+
>+    for (i = params; i->type != cert_po_end; i++) {
>+        memset(&i->value, 0, sizeof(CERTValParamOutValue));
>+    }
>+}

memset is way too costly to use this way.
I suggest something like:

>+    CERTValOutParam *i;
>+    static const CERTValParamOutValue zero;
>+
>+    if (!params)
>+        return;
>+
>+    for (i = params; i->type != cert_po_end; i++) {
>+        i->value = zero;
>+    }
>+}
Attachment #314769 - Flags: superreview?(nelson) → superreview-
so, a quick scan of top crashes found this bug. the problem is that there are no other threads. i'm betting that this is crashing after all other threads have quit. psm is probably missing a message or nss is probably stuck in a position where it won't respond until it does this.

see:
bp-a630028c-06a1-11dd-b776-001321b13766
bp-e58d9c1c-06fd-11dd-8ce7-001cc45a2c28
for examples (just go to the raw data and note the lack of other threads, or show/hide other threads and note that there are no stacks for any other threads).

afaict, there are plenty of null checks floating around.
Severity: normal → critical
Keywords: crash, topcrash
Summary: Crash destroying cert in cert_pkixDestroyValOutParam → Crash destroying cert in [@ CERT_DestroyCertificate - cert_pkixDestroyValOutParam]
nominating since I hit this crash this morning when starting up. Users who software update and have either MP or authentication dialogs pop up will likely hit this crash frequently.
Flags: blocking1.9?
Comment on attachment 314769 [details] [diff] [review]
Patch Alternative A, v1 - zero out all output params

The patch will zero a pointer to initialized nss log, that user passed in with output parameters.
Attachment #314769 - Flags: review?(alexei.volkov.bugs) → review-
We decided on todays NSS meeting that the safest short term solution to the problem is to make caller(PSM) initialize the pointer(value.pointer.cert) to root cert that get returned by CERT_PKIXVerifyCert. I'm working on the patch that will solve it for the long run.
Alternative patch. Sets pointer in output parameters only in case of success. Otherwise cert and cert chain will be destroyed using temporary pointers.
Attachment #314904 - Flags: review?(nelson)
Attached patch wrong patch (obsolete) — Splinter Review
Similar to v1. Patch initializes only specific pointers.
Attachment #314769 - Attachment is obsolete: true
Attachment #314908 - Flags: review?(nelson)
Attachment #314904 - Flags: review?(kengert)
Attachment #314908 - Flags: review?(kengert)
Comment on attachment 314908 [details] [diff] [review]
wrong patch

Incorrect patch.
Attachment #314908 - Attachment is obsolete: true
Attachment #314908 - Flags: review?(nelson)
Attachment #314908 - Flags: review?(kengert)
Same as v3 but without extra code.
Attachment #314909 - Flags: review?(nelson)
Attachment #314909 - Flags: review?(kengert)
The safest and most correct way to fix this for Firefox is to have PSM initialize the output parameters it is requesting before it calls Libpkix.

We'll work out on the right semantic for the verify call itself, but in anycase pre-initialized data should be safe.

bob
Attachment #314769 - Attachment description: Patch v1 → Patch Alternative A, v1 - zero out all output params
Comment on attachment 314904 [details] [diff] [review]
Patch Alternative B, v1 - Don't modify output params until full success is assured

This patch is not a different version of the preceeding one, but rather is a complete alternative approach.
Attachment #314904 - Attachment description: Patch v2 → Patch Alternative B, v1 - Don't modify output params until full success is assured
Attachment #314908 - Attachment description: Patch v3 → wrong patch
Comment on attachment 314909 [details] [diff] [review]
Patch Alternative A, v2 - zero out SOME output params

r=nelson, with one suggestion (see below).
This patch is lower risk than the others proposed so far.
I will study Alternative B, but it will take a little longer.

>+/* Initialize output parameter pointers with NULL. */
>+void
>+cert_pkixInitValOutParam(CERTValOutParam *params)
>+{
>+    CERTValOutParam *i;
>+
>+    if (params == NULL) {
>+        return;
>+    }
>+    for (i = params; i->type != cert_po_end; i++) {
>+        switch (i->type) {
>+        case cert_po_trustAnchor:
>+            i->value.pointer.cert = NULL;
>+            break;
>+
>+        case cert_po_certList:
>+            i->value.pointer.chain = NULL;

here add a break, and a default case with a break.
This will reduce warnings.

>+        }
>+    }
>+}
Attachment #314909 - Attachment description: Patch v4 → Patch Alternative A, v2 - zero out SOME output params
Attachment #314909 - Flags: review?(nelson) → review+
Bob, can you please review the patch in bug 428415?

Marcia, it will be sufficient to fix bug 428415 for Firefox 3 (Mozilla code).

I've filed bug 428415 to apply the "bandaid" to PSM code.
Depends on: 428415
Comment on attachment 314909 [details] [diff] [review]
Patch Alternative A, v2 - zero out SOME output params

I think this patch is not complete.

What about cert_po_nbioContext and value.pointer.p ?

What about cert_po_policyOID and value.array.oids =

cert_po_errorLog and value.pointer.log =

cert_po_extendedKeyusage and value.array.oids ?



What happens if future programmers add additional "types"?

We could use a kind of runtime/compile time assertion, which assures that "cert_po_max == cert_po_extendedKeyusage".

This will remind programmers that whenever a new parameter gets added, it is necessary to verify the correctness of cert_pkixInitValOutParam.
Attachment #314909 - Flags: review?(kengert) → review-
I think alternative B can not be seen as an "alternative".

I think even if you take B, you should still do A.

I think A is the real safe patch, that will work in the future, regardless whether or not programmers are careful enough to do something like B.

I see B as optional.
Comment on attachment 314904 [details] [diff] [review]
Patch Alternative B, v1 - Don't modify output params until full success is assured

r=nelson with the changes shown below.

>+    CERTCertificate *     outRootCert = NULL;
>+    CERTCertList *        outCertChain = NULL;
>+    CERTCertificate **    outParamCertPrt = NULL;
>+    CERTCertList **       outParamChainPrt = NULL;

Please change "Prt" to "Ptr" or "Addr" in those two names above
(and where those names are used below).

>     if (error != NULL) {
  ...
>+    } else {
>+        if (outRootCert)
>+            *outParamCertPrt = outRootCert;
>+        if (outCertChain)
>+            *outParamChainPrt = outCertChain;
>     }

Those two if statements above need to test whether the pointer is NULL,
not whether the value to be stored there is NULL.  e.g. 

>+        if (outParamCertPtr)
>+            *outParamCertPtr = outRootCert;
>+        if (outParamChainPtr)
>+            *outParamChainPtr = outCertChain;

The reason is that, if the function succeeds, those output params MUST
be filled in with some value, even if it's NULL.  They must not be left
uninitialized, just because the output value was NULL.
Attachment #314904 - Flags: review?(nelson) → review+
I think the solution is to define the API such that it is the caller's 
responsibility to initialize the output params.  I don't think that's a 
band-aid.  I think that's what we agreed to in this morning's meeting.
Blocking as this could affect the software update experience.  Need to make
sure we maintain user confidence there.  

Please request approval1.9? as soon as reviews are completed so we can get this
in.

Flags: blocking1.9? → blocking1.9+
Damon,

We've decided the fix is the patch to bug 428415, so please mark that bug
as blocking 1.9, instead of, or in addition to, this one.  
I've also sent email to Damon to explain comment 19 and comment 25 further.
So, if bug 428415 fixed this, can this be resolved as fixed?
Minusing per comment 25
Flags: blocking1.9+ → blocking1.9-
So, I went ahead and approved bug 428415 .  I'll also mark it as blocking. 
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
This NSS bug is not fixed.  
However, it is not an issue for FF3 any more.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Severity: critical → normal
Keywords: crash, topcrash
Note that this bug also causes a crash in vfychain when the -t and -v options
are both used, and the cert does not chain to the trust anchor.  

I think Kai has a good point in comment 20.  The fix for this bug needs to 
handle all the possible pointer-oriented output parameters.  
Target Milestone: 3.12 → 3.12.1
Target Milestone: 3.12.1 → 3.12.2
Attachment #314909 - Attachment is obsolete: true
Attachment #314904 - Flags: review?(kaie)
Comment on attachment 314904 [details] [diff] [review]
Patch Alternative B, v1 - Don't modify output params until full success is assured

Clearing my review request.
Nelson has requested a new patch in comment 22, I can review the next patch. (I think I already stated my preference for this patch in above comments.)
Actually, Kai, my comment 22 gives permission to commit that patch, after 
changing it as shown in that comment.
fine with me :-)
I'm marking this bug as WORKSFORME as bug crashlog signature didn't appear from a long time (over half year) in Firefox.
Status: REOPENED → RESOLVED
Crash Signature: [@ CERT_DestroyCertificate - cert_pkixDestroyValOutParam]
Closed: 12 years ago3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.