Closed
Bug 433594
Opened 16 years ago
Closed 16 years ago
Crash destroying OCSP Cert ID [@ CERT_DestroyOCSPCertID ]
Categories
(NSS :: Libraries, defect, P1)
Tracking
(Not tracked)
VERIFIED
FIXED
3.12.1
People
(Reporter: fittysix, Assigned: alvolkov.bgs)
References
()
Details
(Keywords: crash, regression, Whiteboard: PKIX [RC2+])
Crash Data
Attachments
(2 files, 1 obsolete file)
1.48 KB,
application/octet-stream
|
Details | |
684 bytes,
patch
|
julien.pierre
:
review+
KaiE
:
review+
shaver
:
approval1.9+
|
Details | Diff | Splinter Review |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008051206 Firefox/3.0 (rc1 candidate 1) My co-worker is getting this crash every time he visits the URL in the bug report. Crash id: 47c555b0-2127-11dd-b748-001cc45a2ce4 Paypal seems to have no problems, which is the same CA, and thus far this remains the only website that he has experienced this on. I'll try to get a stack trace some time, but as this is his working machine it might not be possible to get that right away. Possibly related to Bug 428038
Reporter | ||
Updated•16 years ago
|
Summary: Crash destroying cert [@ CERT_DestroyOCSPCertID ] → Crash destroying OSCP Cert ID [@ CERT_DestroyOCSPCertID ]
Comment 1•16 years ago
|
||
Stack contains 0 nss3.dll CERT_DestroyOCSPCertID mozilla/security/nss/lib/certhigh/ocsp.c:1519 1 nss3.dll pkix_pl_OcspCertID_Destroy mozilla/security/nss/lib/libpkix/pkix_pl_nss/pki/pkix_pl_ocspcertid.c:69 2 nss3.dll PKIX_PL_Object_DecRef mozilla/security/nss/lib/libpkix/pkix_pl_nss/system/pkix_pl_object.c:936 3 nss3.dll PKIX_PL_OcspCertID_Create mozilla/security/nss/lib/libpkix/pkix_pl_nss/pki/pkix_pl_ocspcertid.c:178 4 nss3.dll pkix_OcspChecker_Check mozilla/security/nss/lib/libpkix/pkix/checker/pkix_ocspchecker.c:183 5 nss3.dll pkix_RevCheckCert mozilla/security/nss/lib/libpkix/pkix/top/pkix_validate.c:390 6 nss3.dll pkix_CheckChain mozilla/security/nss/lib/libpkix/pkix/top/pkix_validate.c:978 7 nss3.dll pkix_Build_ValidateEntireChain mozilla/security/nss/lib/libpkix/pkix/top/pkix_build.c:1632 8 nss3.dll pkix_BuildForwardDepthFirstSearch mozilla/security/nss/lib/libpkix/pkix/top/pkix_build.c:3047 9 nss3.dll pkix_Build_InitiateBuildChain mozilla/security/nss/lib/libpkix/pkix/top/pkix_build.c:4194 10 nss3.dll PKIX_BuildChain mozilla/security/nss/lib/libpkix/pkix/top/pkix_build.c:4384 11 nss3.dll CERT_PKIXVerifyCert mozilla/security/nss/lib/certhigh/certvfypkix.c:2138 12 xul.dll xul.dll@0x2efb2a 13 xul.dll nsNSSCertificate::getValidEVOidTag mozilla/security/manager/ssl/src/nsIdentityChecking.cpp:970 14 xul.dll nsNSSCertificate::GetIsExtendedValidation mozilla/security/manager/ssl/src/nsIdentityChecking.cpp:996 15 xul.dll AuthCertificateCallback mozilla/security/manager/ssl/src/nsNSSCallbacks.cpp:979 16 ssl3.dll ssl3_HandleCertificate mozilla/security/nss/lib/ssl/ssl3con.c:7260 17 ssl3.dll ssl3_HandleHandshakeMessage mozilla/security/nss/lib/ssl/ssl3con.c:7938 18 ssl3.dll ssl3_HandleHandshake mozilla/security/nss/lib/ssl/ssl3con.c:8062 19 ssl3.dll ssl3_HandleRecord mozilla/security/nss/lib/ssl/ssl3con.c:8325 20 ssl3.dll ssl3_GatherCompleteHandshake mozilla/security/nss/lib/ssl/ssl3gthr.c:206 21 ssl3.dll ssl_GatherRecord1stHandshake mozilla/security/nss/lib/ssl/sslcon.c:1258 22 ssl3.dll ssl_Do1stHandshake mozilla/security/nss/lib/ssl/sslsecur.c:151 23 ssl3.dll ssl_SecureSend mozilla/security/nss/lib/ssl/sslsecur.c:1152 24 ssl3.dll ssl_SecureWrite mozilla/security/nss/lib/ssl/sslsecur.c:1197 25 ssl3.dll ssl_Write mozilla/security/nss/lib/ssl/sslsock.c:1487 26 xul.dll nsSSLThread::Run mozilla/security/manager/ssl/src/nsSSLThread.cpp:1029
Assignee: nobody → alexei.volkov.bugs
Priority: -- → P1
Whiteboard: PKIX
Version: unspecified → trunk
Comment 2•16 years ago
|
||
Does Not appear related to Bug 428038.
Comment 3•16 years ago
|
||
Ryan, is your co-worker's system clock wrong, perhaps some hours or days in the past? Is his system time zone wrong?
Comment 4•16 years ago
|
||
www.techdata.com is a misconfigured server. It has a Verisign class 3 SSL server cert, but IINM, it is not configured to send out the proper intermediate CA certificate for the Verisign class 3 Intermediate CA. The fact that the browser got as far as calling libPKIX means that the non-PKIX cert verification passed, which in turn means that the browser had previously imported and saved that Verisign class 3 Intermediate CA. None of this excuses the crash. It's just helpful info to understand the circumstances in which the crash occurred.
Reporter | ||
Comment 5•16 years ago
|
||
The system clock is accurate, and synced to the windows internet time server. The time zone appears to be correct as well. I should probably mention, we tried completely uninstalling firefox, deleting all program directories and removing all appdata/local settings mozilla directories, and the problem remained.
Comment 6•16 years ago
|
||
With the attached cert installed in my cert DB, I ran the commands vfyserv -d $HOME/.netscape -c -o www.techdata.com NSS_ENABLE_PKIX_VERIFY=1 vfyserv -d $HOME/.netscape -c -o www.techdata.com At first, I got this error: Error in function PR_Write: -8071 - The OCSP server experienced an internal error. But then it changed, and now I get: Handshake Complete: SERVER CONFIGURED CORRECTLY I wonder if the server configuation was changed. Ryan, can you still reproduce?
Reporter | ||
Comment 7•16 years ago
|
||
Still seems to happen. One odd note, it crashes with the same signature when he looks at the bookmark properties: bp-b08e17a8-2140-11dd-b7f3-001321b13766
Comment 8•16 years ago
|
||
I cannot reproduce this with any of NSS's test programs. They all complete with out crashing.
Reporter | ||
Comment 9•16 years ago
|
||
I've just managed to reproduce this on a different machine, this one is a vista machine that had never accessed techdata before. The trace on this one is a little shorter. bp-eb0e5aee-2149-11dd-8d0e-0013211cbf8a When I first visit the page in that URL I get Error code: sec_error_unknown_issuer I can blindly add an exception, but if I click the view button I get the crash.
Reporter | ||
Comment 10•16 years ago
|
||
Hmm, so ignoring the whole sec_error_unknown_issuer thing, which I assume is the misconfiguration on their part. I am able to view the cert without crashing on Beta 5, but I do get the crash on a build from 04/23/08, which is the oldest build I happened to conveniently have on my hard drive. There's still about a month of unknown between there, so I'll see if I can narrow down a regression range a little, if that can be found then it might be a little easier to figure out what exactly causes the crash.
Keywords: regression
Reporter | ||
Comment 11•16 years ago
|
||
So, I've come up with this: http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2008-04-08+05%3A00%3A00&maxdate=2008-04-09+07%3A00%3A00&cvsroot=%2Fcvsroot The nightly 2008-04-08-05 worked, 2008-04-09-07 doesn't. The ones that immediately stand out are Bug 406755 and bug 426681, but that's a pretty big list.
Comment 12•16 years ago
|
||
Only two of the files changed in that huge list of checkins are NSS. We cannot reproduce this with any of the NSS test tools. Not with vfyserv, vfychain, nor ocspclnt. Reproducing this bug seems to require PSM. While that's not completely conclusive, it suggests strongly that this problem involves something that PSM does that NSS does not.
Reporter | ||
Comment 13•16 years ago
|
||
Got it, regression range in comment 11 is a little wrong. Bug 375019 http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/libpkix/pkix_pl_nss/pki/pkix_pl_ocspcertid.c&mark=170,171,178#164 The CERT_CreateOCSPCertID is returning null because of sec_error_unknown_issuer This throws the cleanup, which calls PKIX_DECREF(cid) which calls the destructor: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/libpkix/pkix_pl_nss/pki/pkix_pl_ocspcertid.c&mark=69#52 which tries to destroy the non-existant certID
Comment 14•16 years ago
|
||
Ryan, Here are a few thoughts. 1. Good Job. You found a real bug, and your description of the sequence of events that leads to it also suggests a second bug. Your results show that if CERT_CreateOCSPCertID fails for ANY reason, when called from PKIX_PL_OcspCertID_Create, the result will be a crash. I'm surprised that our automated testing didn't catch that. We do testing that injects allocation failures, simulating out of memory conditions, to excersize our error paths. A memory allocation failure during the execution of CERT_CreateOCSPCertID would have caused the same result. But we haven't encountered that. Makes me wonder if our allocation failure injection is working as well as we think it is. The fix to the immediate cause of the crash is now obvious. There are (at least) two ways we could fix it. I recomend changing CERT_DestroyOCSPCertID to check its input argument for NULL. It should also set an error code when returning SECFailure. 2. I wonder, did you see this in a debugger? Did you actually see the sec_error_unknown_issuer error occur when ocsp_CreateCertID called CERT_FindCertIssuer? Or is that an inference/deduction? I ask this because that should never happen, and if it did happen, then there is another bug somewhere else, in the code path that preceded the call to pkix_RevCheckCert. That error is occuring while we're building the OCSP request for a cert to be checked for revocation. That error code implies that we're attempting to check the revocation status of a cert for which we do not have the issuer certificate. That's a serious no-no. We should NEVER attempt to do a revocation check on a cert that has not already been validated (except for revocation). Revocation checking should be the last step, after the whole chain has passed the test of having valid issuers, valid signatures, not expired, etc., in a chain that is anchored by a trusted root. If we were following that rule, we would have already found an issuer cert for the cert that we're checking with OCSP. Consequently, we should not do the OCSP check until after we've found a valid issuer for the cert. (This implies that the revocation checks should proceed from root to leaf, and not in the opposite order). So, it should be impossible to experience an "unknown issuer" error while doing OCSP, because the issuer cert should already have been checked and found valid. So, I ask if you're sure that the error experienced in the OCSP code was SEC_ERROR_UNKNOWN_ISSUER, because if so, we've got a bug in a higher layer of the code. It occurs to me that PSM's new "security exceptions" for certificates may be defeating this rule. I wonder if perhaps you had created a "security exception" for this server's certificate, to work around the fact that the issuer cert was not being sent by the https server. (?) I wonder if PSM is attempting to do EV certificate checks on certs that have "security exceptions". That would be another no-no. If that's what's going on, then we have a PSM bug here, too.
Reporter | ||
Comment 15•16 years ago
|
||
I'm actually surprised this bug never got hit before considering it happens on all OCSP errors I did see it when debugging in visual studio. When I click the view certificate button (easiest way to invoke it) it runs through pkix_pl_ocspcertid.c twice, the first time it goes through completely valid. The second time is when it fails by falling in to this http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/certhigh/ocsp.c&mark=1665-1667#1660 Which eventually ends up here: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/certhigh/certvfy.c&mark=266#260
Comment 16•16 years ago
|
||
Comment on attachment 321531 [details] [diff] [review] Immediate quick fix It's certainly not true that this "happens on all OCSP errors". It happens only in the relatively rare case where OCSP is unable to successfully construct the OCSP request to send to the server, because it is unable to construct the "certID". Please fix this in CERT_DestroyOCSPCertID. CERT_DestroyOCSPCertID has 3 callers. I'd rather fix it once in that function, than multiple times in all the callers of that function.
Attachment #321531 -
Flags: review-
Comment 17•16 years ago
|
||
Ryan, did you create a "security exception" for this certificate?
Target Milestone: --- → 3.12.1
Comment 18•16 years ago
|
||
In comment 15, Ryan wrote: > When I click the view certificate button (easiest way to invoke it) it > runs through pkix_pl_ocspcertid.c twice, the first time it goes through > completely valid. The second time is when it fails [...] Is this when viewing the certificate for https://www.techdata.com/ ? Are you clicking the view certificate button in the "Add Security Exception" dialog box? Or elsewhere? Do the two calls to PKIX_PL_OcspCertID_Create (one of which precedes a successful test, and one of which precedes a failure) both get the same certificate for the first argument? Or are they testing different certificates? (If I could reproduce this, I wouldn't ask all these questions. Sorry.)
Comment 19•16 years ago
|
||
Actually, I'm willing to take both patches, this and the previous one. The other patch is not wrong. It's merely not as complete as I'd like. I think this bug should be nominated to be fixed in a new NSS 3.12 release candidate, and get into FF3.
Attachment #321538 -
Flags: review?(julien.pierre.boogz)
Reporter | ||
Comment 20•16 years ago
|
||
I'm using techdata for testing, and clicking the view button in the 'add security exception' dialog. I don't have any security exceptions on the machine that I am debugging on, it is a little strange you can't reproduce it though, since I've managed it on all but 1 windows machine (a machine with a very old profile that had previously been to techdata). It seems to always reproduce on a completely fresh profile, though it might make a difference that I'm in Canada, they could have a different server setup. the nssCert of the 2 certs (hopefully bugzilla formats these nicely) pass: 0x086a4eb0 {arena=0x08664ff0 subjectName=0x086a8740 "CN=VeriSign Class 3 Extended Validation SSL CA,OU=Terms of use at https://www.verisign.com/rpa (c)06,OU=VeriSign Trust Network,O="VeriSign, Inc.",C=US" issuerName=0x086addb8 "CN=VeriSign Class 3 Public Primary Certification Authority - G5,OU="(c) 2006 VeriSign, Inc. - For authorized use only",OU=VeriSign Trust Network,O="VeriSign, Inc.",C=US" . fail: 0x08695dc8 {arena=0x08696f30 subjectName=0x08697448 "CN=www.techdata.com,OU=web1-5,O=Tech Data Corporation,OID.2.5.4.9=5350 Tech Data Drive,L=Clearwater,ST=Florida,postalCode=33760,C=US,serialNumber=465173,OID.2.5.4.15="V1.0, Clause 5.(b)",OID.1.3.6.1.4.1.311.60.2.1.2=Florida,OID.1.3.6.1.4.1.311.60.2.1.3=US" issuerName = 0x08697548 "CN=VeriSign Class 3 Extended Validation SSL CA,OU=Terms of use at https://www.verisign.com/rpa (c)06,OU=VeriSign Trust Network,O="VeriSign, Inc.",C=US"
Comment 21•16 years ago
|
||
OK, I am now able to reproduce this crash with a fresh profile when I attempt to view the cert in the add exception dialog. Thanks. The information that you provided in comment 20 is very interesting! I have not yet attempted to reproduce those results. They show that libPKIX *IS* checking revocation in the desired order, progressing from the highest level intermediate CA towards the EE (leaf) cert. It verifies the certificate status for the issuer certificate first, and then attempts to verify the status of the EE (leaf) cert. But when it attempts to verify the status of the EE cert, it is unable to find the issuer cert that it just checked in the preceding OCSP check! So, the question is: how can it fail to find the very same issuer certificate that it JUST CHECKED? This is really a question for bug 434398.
Comment 22•16 years ago
|
||
Comment on attachment 321531 [details] [diff] [review] Immediate quick fix I'm willing to commit this patch together with the other one, when that is reviewed.
Attachment #321531 -
Flags: review- → review+
Reporter | ||
Comment 23•16 years ago
|
||
might as well throw it in the rc2? pile. There was a huge spike in these for some reason. Still fairly far down on the list, but reading through some of those reports it looks like there's a few affected sites. http://crash-stats.mozilla.com/report/list?range_unit=weeks&version=Firefox%3A3.0&range_value=2&signature=CERT_DestroyOCSPCertID
Flags: blocking1.9?
Summary: Crash destroying OSCP Cert ID [@ CERT_DestroyOCSPCertID ] → Crash destroying OCSP Cert ID [@ CERT_DestroyOCSPCertID ]
Whiteboard: PKIX → PKIX [RC2?]
Updated•16 years ago
|
Attachment #321538 -
Flags: review?(julien.pierre.boogz) → review+
Comment 24•16 years ago
|
||
Is there a way to add a crashtest for this fix?
Comment 25•16 years ago
|
||
Comment on attachment 321538 [details] [diff] [review] proposed patch v1 (checked in on trunk) Checking in certhigh/ocsp.c; new revision: 1.52; previous revision: 1.51
Attachment #321538 -
Attachment description: proposed patch v1 → proposed patch v1 (checked in on trunk)
Comment 26•16 years ago
|
||
Do we have data about the parameter value to the destroy function? Was it always NULL when you crashed?
Reporter | ||
Comment 27•16 years ago
|
||
on pkix_pl_OcspCertID_Destroy? If so, the PKIX_PL object was there, and appeared to be valid, but the magicheader was 0 IIRC. AFAICT this is normal behavior, since the certid should be null if it returns an error. For the CERT_DestroyOCSPCertID function, it appeared that the CertID was null, or somehow invalid. MSVC showed the proper data structure of the CertID, but the data was all "???" (same result you get with an invalid ptr). I can confirm this later, but I don't have access to my dev machine ATM.
Comment 28•16 years ago
|
||
Ryan, I'm only interested in the parameter value to function CERT_DestroyOCSPCertID. Thanks!
Comment 29•16 years ago
|
||
Comment on attachment 321538 [details] [diff] [review] proposed patch v1 (checked in on trunk) This patch is prerequisite to either of the two alternative patches for bug 433386, which are now being considered for a new NSS 3.12.0.x release candidate. Bob, please SR for the 3.12.0 branch.
Attachment #321538 -
Flags: superreview?(rrelyea)
Comment 30•16 years ago
|
||
Comment on attachment 321531 [details] [diff] [review] Immediate quick fix Although reviewed, apparently this patch did not get checked in, and it collides with the work in bug 433386, so I'm marking it as obsolete.
Comment 31•16 years ago
|
||
Comment on attachment 321538 [details] [diff] [review] proposed patch v1 (checked in on trunk) r=kengert
Attachment #321538 -
Flags: review+
Comment 32•16 years ago
|
||
Comment on attachment 321531 [details] [diff] [review] Immediate quick fix Yes, I was about to check in this patch when I realized just how flawed the whole certIDWasConsumed flag really is. That led to bug 435007. The patch for bug 435007 conflicts with (and supersedes) this patch, so I agree, this patch is obsolete.
Attachment #321531 -
Attachment is obsolete: true
Comment 33•16 years ago
|
||
OK, so what's the status here, then? Do we have a patch checked in on a NSS branch? Is this to be fixed by one of those other bugs? If we're to consider this for RC2, we'll need to know: - status and ETA to fix being ready - risk of regression
Comment 34•16 years ago
|
||
My understanding is that the patch we actually want is in bug 433386, where it (as well as a broader, post-FF3, patch) have already been reviewed and sr'd. I can't comment on the risk here, but I hope Kai, Nelson, or Bob can do so, as well as correct me if I mis-stated things above.
Comment 35•16 years ago
|
||
Johnathan & Mike, I believe there are two separate sets of "topcrash" bugs, with two separate patches, one in this bug, and one in bug 433386, where the attachment is named "(minimal) Patch v1 b". I believe both patches are necessary to fix the related top crashers. I believe both are VERY low risk (just adding detection of NULL pointers). The NSS team is willing to produce another NSS 3.12.0 Release Candidate for inclusion in FF3 RC2, if MoCo wants it (and it's beginning to sound like you do, but please confirm). Please tell us the date on which you would need that new NSS release candidate. One other crash bug in NSS is bug 432284. We would also supply a fix for that bug if we could reproduce it or knew the cause exactly, but we don't. It apparently only happened on one day. See the comments in that bug for info on it.
Updated•16 years ago
|
Flags: wanted1.9.0.x?
Flags: wanted-next+
Flags: blocking1.9?
Flags: blocking1.9-
Comment 36•16 years ago
|
||
Comment on attachment 321538 [details] [diff] [review] proposed patch v1 (checked in on trunk) Clearing the request for review from Bob (but please feel free to have a look). We have two reviews already.
Attachment #321538 -
Flags: superreview?(rrelyea)
Comment on attachment 321538 [details] [diff] [review] proposed patch v1 (checked in on trunk) a=shaver for landing on the Firefox 3 NSS branch thingy place. Thanks!
Attachment #321538 -
Flags: approval1.9+
Comment 39•16 years ago
|
||
checked in to 3.12.0 branch for 3.12.0 rc4 Checking in certhigh/ocsp.c; /cvsroot/mozilla/security/nss/lib/certhigh/ocsp.c,v <-- ocsp.c new revision: 1.50.6.1; previous revision: 1.50 done
Updated•16 years ago
|
Flags: wanted1.9.0.x?
Flags: blocking1.9-
Flags: blocking1.9+
Whiteboard: PKIX [RC2?] → PKIX [RC2+]
Comment 41•16 years ago
|
||
This is FIXED, aiui.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Comment 42•16 years ago
|
||
It is fixed on the branch that NSS maintains for FF3, but not yet on the trunk.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 43•16 years ago
|
||
Nelson, didn't you say in comment 25 that you have checked it in?
Comment 44•16 years ago
|
||
Uh, we're building Firefox 3 RC2 with the understanding that it was checked in on NSS 3.12 RC 4. If that's not the case, someone needs to raise a flag RFN.
Comment 45•16 years ago
|
||
Oh, nm, I misinterpreted comments.
Comment 46•16 years ago
|
||
D'oh!
Status: REOPENED → RESOLVED
Closed: 16 years ago → 16 years ago
Resolution: --- → FIXED
Comment 47•16 years ago
|
||
verified fixed using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0 and the testurl - no crash with the RC2 Candidate Build -> Verified Fixed
Status: RESOLVED → VERIFIED
Keywords: crash
Comment 48•16 years ago
|
||
Carsten, than you for verifying this bug. Please also check bug 435496, which is conjectured to be another duplicate of this bug (having the same cause), and see if it is also fixed by the fix for this bug. Thanks.
Updated•16 years ago
|
Blocks: NSS312regressions
Updated•13 years ago
|
Crash Signature: [@ CERT_DestroyOCSPCertID ]
You need to log in
before you can comment on or make changes to this bug.
Description
•