Closed Bug 433386 Opened 16 years ago Closed 16 years ago

when system clock is off by more than two days, OSCP check fails, can result in crash if user tries to view certificate [@ SECITEM_CompareItem_Util] [@ memcmp]

Categories

(NSS :: Libraries, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mconnor, Assigned: KaiE)

References

Details

(4 keywords)

Crash Data

Attachments

(4 files, 1 obsolete file)

Got a new PC, system date was off by a number of days.  On (seemingly) EV sites only, but could be anything with OCSP, I would get crashes on pageload OR viewing the cert OR on shutdown.  The EV UI would also fail to display. beltzner did some poking, looks like when its > 2 days off its 100% reproducible.  Seems like the type of thing we should just fix, since its entirely non-obvious what's causing it.  Probably also want a technote on it for SUMO...

Some stacks:

246f66de-2064-11dd-a8ec-001cc45a2ce4
767c7626-2062-11dd-ada7-0013211cbf8a
9dcd091c-2061-11dd-a2f9-001321b13766
dd4ee885-2060-11dd-b536-001cc4e2bf68
5be344ae-2060-11dd-9c05-001cc4e2bf68
ef36b0ca-205f-11dd-bd8d-001cc4e2bf68
Flags: wanted1.9.0.x+
Flags: wanted-next+
More stacks:

http://crash-stats.mozilla.com/report/index/71c5df1e-2065-11dd-abbc-0013211cbf8a
http://crash-stats.mozilla.com/report/index/95bd741c-2065-11dd-a7d8-001321b13766

The stacks always look the same:

0  	mozcrt19.dll  	memcmp  	
1 	nssutil3.dll 	SECITEM_CompareItem_Util 	mozilla/security/nss/lib/util/secitem.c:165
2 	nss3.dll 	ocsp_CacheKeyCompareFunction 	mozilla/security/nss/lib/certhigh/ocsp.c:324
3 	plds4.dll 	PL_HashTableRawLookup 	mozilla/nsprpub/lib/ds/plhash.c:200
4 	nss3.dll 	nss3.dll@0xa101f 	

Implies that the problem is here:

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/util/secitem.c&rev=1.14&mark=165#165
Summary: Seemingly OCSP-related crashes when system clock is off by a few days → Seemingly OCSP-related crashes when system clock is off by more than two days
Summary: Seemingly OCSP-related crashes when system clock is off by more than two days → OCSP check crashes when system clock is off by more than two days (happens when looking into sites with EV SSL certificates)
I should note that the crash only happened to me when I tried to view the certificate, though mconnor claims that he crashed at other times.
Summary: OCSP check crashes when system clock is off by more than two days (happens when looking into sites with EV SSL certificates) → when system clock is off by more than two days, OSCP check fails, can result in crash if user tries to view certificate
Kai, This crash implies that the pointers and/or lengths in the OCSP cache
are bogus.  The function that crashes is well protected against NULL pointers
and zero lengths, but bogus non-NULL pointers will crash it.  
Since you wrote the OCSP cache code, please take a look.
Assignee: nobody → kengert
Mike & Mike,  When you say "off by > 2 days", 
is that system clock in the past? or the future?

Kai, I'm guessing that some cache entry has been freed, or otherwise 
overwritten, while the cache's PLHashTable still points to it.
In the past.  I was set to May 3rd on the new machine, never noticed (and, oddly enough, the internet time sync was failing)
Keywords: crash
Summary: when system clock is off by more than two days, OSCP check fails, can result in crash if user tries to view certificate → when system clock is off by more than two days, OSCP check fails, can result in crash if user tries to view certificate [@ SECITEM_CompareItem_Util] [@ memcmp]
Chris, when I go to the URL you gave in comment 6, I get a page that has 
"Your system clock is cnot correct" in the page outline at the top, 
but no corresponding section appears in the body of the document. 

In the page outline, that heading appears between "nsBrowserOpt.dll" and
"Other causes", but in the body of the page, the section "Other causes"
follows "nsBorwsefrOpt.dll" immediately with no intervening section about
the system clock.  

An examination of the document source shows that there is some stuff there
in the source, between those two sections, but none of it displays on the 
browser page window.  
[@ memcmp] is topcrash #6 for Firefox 3 RC1.  I picked five of them at random and they were all being called by SECITEM_CompareItem_Util.

http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.0
Keywords: topcrash
Mike & Mike (or anybody who has reproduced it):
Do you have one or more specific URLs with which to reproduce this?
(In reply to comment #9)
> Mike & Mike (or anybody who has reproduced it):
> Do you have one or more specific URLs with which to reproduce this?

STR:

1) Set system clock back 2 weeks
2) Open Firefox
3) Visit https://www.godaddy.com
  - Note that, since the OCSP response failed to validate, being out of range, you do not see EV treatment, just normal SSL
4) Page Info->Security->View Certificate
5) Crash
Does the work in bug 433594 shed any light on this?  I recognize that the code in question is in a different location, but given that both bugs concern code that deals with OCSP checks (and their failures) I thought it would be particularly smashing if one fix covered both, or at least lead to an obvious fix here as well.
Sorry that it took me so long until I've started to look at this bug.
I'm able to reproduce.
FWIW, instead of changing system time, this little patch also helps me to reproduce the crash.
Comment on attachment 321965 [details] [diff] [review]
fake time patch for testing on linux

This patch is a keeper!
Thanks Kai
Attached patch (minimal) Patch v1 (obsolete) — Splinter Review
This patch fixes the crash for me.

The crash is caused by destroying a pointer while still holding to it elsewhere.
- pkix_pl_OcspResponse_GetStatusForCert derives the status from the response,
  but at the same time creates a response object.
  It correctly sets the flag that our certID object got consumed for
  the new cache entry
- Function PKIX_PL_OcspCertID_RememberOCSPProcessingFailure
  was meant to be called only when we have not yet created a cache entry.
  The code makes the incorrect assumption that (!passed) indicates
  "no cache entry created yet". That's wrong. And we don't check whether
  the certID got already consumed. (It was.)
- We call PKIX_PL_OcspCertID_RememberOCSPProcessingFailure,
  which reuses the same flag variable for the consumption status,
  which gets set to FALSE when some function deeper down in OCSP processing
  prepares the out parameter
- no second cache entry gets created (correct), but the flag "was consumed"
  incorrectly got reverted to "no"
- we destroy the certID object, although it's still in possession by the 
  cache
- we crash on the next operation that accesses the cache

This bug got introduced when I cache-enabled libpkix (wrapper code around the OCSP cache).
While the previous comment lists a minimal patch, Nelson has complained about the concept of using a decoupled "pointer was consumed" variable, and I have to agree. He proposes to simply set the pointer variable to NULL as soon as we no longer own the pointer.

Here is an alternative patch, which is probably cleaner, but also larger.

So the question is:
Should we take the minimal patch or the cleaner patch for NSS 3.12.0 / Firefox 3.0.0 ?
Attachment #321986 - Flags: review?(nelson)
Attachment #321990 - Flags: review?(nelson)
Kai's comment 14 suggests that this is another manifestation of bug 435007.
The "(minimal) patch v1", attachment 321986 [details] [diff] [review], above conflicts with the patch 
attached to that bug, which (I'm sure) is why Kai didn't request review for 
this patch.

The second patch attached here appears to be a merger of the two patches.
I will review it.
Comment on attachment 321990 [details] [diff] [review]
(cleaner) Patch v2

This patch appears to be correct, and is what we want for NSS 3.12.1 and beyond.  
However, if we're going to spin another NSS 3.12.0 release candidate for FF3 RC2, Maybe the "minimal patch v1" is less risk for that.  Let's ask Bob.

Also note that if we respin another NSS 3.12.0 Release Candidate, we MUST also pick up the fix for bug 433594.
Attachment #321990 - Flags: review?(nelson) → review+
Comment on attachment 321986 [details] [diff] [review]
(minimal) Patch v1 


>-        if (!passed && cid) {
>+        if (!passed && cid && !cid->certIDWasConsumed) {

Actually, it would be much safer to use

>+        if (!passed && cid && cid->certID && !cid->certIDWasConsumed) {

Please do that.  r+ for the 3.12.0 branch with that additional change.
I'm asking Bob for SR for that branch.
Since either of the two patches to this bug depend on the patch for 
bug 433594, I'm asking Bob to SR that bug's patch, too.
Attachment #321986 - Flags: superreview?(rrelyea)
Attachment #321986 - Flags: review?(nelson)
Attachment #321986 - Flags: review+
Marking this bug as depending on bug 433594, since the fix to that bug
is pre-requisite to the patch(es) for this bug.
Depends on: 433594
Adding Nelson's requested change, keeping Nelsons review flags.
Attachment #321986 - Attachment is obsolete: true
Attachment #321998 - Flags: superreview?(rrelyea)
Attachment #321998 - Flags: review+
Attachment #321986 - Flags: superreview?(rrelyea)
(In reply to comment #14)

> - We call PKIX_PL_OcspCertID_RememberOCSPProcessingFailure,
>   which reuses the same flag variable for the consumption status,
>   which gets set to FALSE when some function deeper down in OCSP processing
>   prepares the out parameter
> - no second cache entry gets created (correct), but the flag "was consumed"
>   incorrectly got reverted to "no"

The patch that eliminates the "was consumed" flag presumably fixes the 
problems where the flag's value was incorrect.  Does the problem of 
incorrect values in "was consumed" flag remain with the "minimal patch v1" ?
If the flag is set incorrectly, then adding tests for the value of that 
flag in various places won't actually fix things. 

> - we destroy the certID object, although it's still in possession by the 
>   cache
> - we crash on the next operation that accesses the cache
> 
> This bug got introduced when I cache-enabled libpkix (wrapper code around the
> OCSP cache).

If "minimal patch v1" doesn't fix that, I must withdraw my r+ from it.
Nelson, the minimal patch avoids the second function call, if our object got already consumed.
It's the second function call that may revert the flag to an incorrect value.

I think the minimal patch is fine.
Comment on attachment 321998 [details] [diff] [review]
(minimal) Patch v1 b

r+ yes, less take this one in 3.12.0

the other is definately appropriate for 3.12.1

bob
Attachment #321998 - Flags: superreview?(rrelyea) → superreview+
Blocks: 435959
Comment on attachment 321998 [details] [diff] [review]
(minimal) Patch v1 b

a=shaver for landing on Firefox-destined branch, thanks.
Attachment #321998 - Flags: approval1.9+
checked in to 3.12.0 branch for rc4

Checking in libpkix/pkix/checker/pkix_ocspchecker.c;
/cvsroot/mozilla/security/nss/lib/libpkix/pkix/checker/pkix_ocspchecker.c,v  <--  pkix_ocspchecker.c
new revision: 1.8.2.1; previous revision: 1.8
done
checked in the cleaner patch to 3.12 trunk for 3.12.1
marking fixed

Checking in pkix/checker/pkix_ocspchecker.c;
/cvsroot/mozilla/security/nss/lib/libpkix/pkix/checker/pkix_ocspchecker.c,v  <--  pkix_ocspchecker.c
new revision: 1.9; previous revision: 1.8
done
Checking in pkix_pl_nss/pki/pkix_pl_ocspcertid.c;
/cvsroot/mozilla/security/nss/lib/libpkix/pkix_pl_nss/pki/pkix_pl_ocspcertid.c,v  <--  pkix_pl_ocspcertid.c
new revision: 1.4; previous revision: 1.3
done
Checking in pkix_pl_nss/pki/pkix_pl_ocspcertid.h;
/cvsroot/mozilla/security/nss/lib/libpkix/pkix_pl_nss/pki/pkix_pl_ocspcertid.h,v  <--  pkix_pl_ocspcertid.h
new revision: 1.2; previous revision: 1.1
done
Checking in pkix_pl_nss/pki/pkix_pl_ocspresponse.c;
/cvsroot/mozilla/security/nss/lib/libpkix/pkix_pl_nss/pki/pkix_pl_ocspresponse.c,v  <--  pkix_pl_ocspresponse.c
new revision: 1.10; previous revision: 1.9
done
Status: NEW → RESOLVED
Closed: 16 years ago
OS: Windows Vista → All
Hardware: PC → All
Resolution: --- → FIXED
Target Milestone: --- → 3.12
as mentioned in bug 436903 this bug moved to #6 topcrash on RC1.  we should watch for a reduction on the next release candidate.
Flags: wanted1.9.0.x+
Crash Signature: [@ SECITEM_CompareItem_Util] [@ memcmp]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: