Last Comment Bug 433386 - when system clock is off by more than two days, OSCP check fails, can result in crash if user tries to view certificate [@ SECITEM_CompareItem_Util] [@ memcmp]
: when system clock is off by more than two days, OSCP check fails, can result ...
Status: RESOLVED FIXED
: crash, relnote, topcrash, user-doc-complete
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: trunk
: All All
: -- normal (vote)
: 3.12
Assigned To: Kai Engert (:kaie)
:
Mentors:
: 436903 (view as bug list)
Depends on: 433594
Blocks: 435959
  Show dependency treegraph
 
Reported: 2008-05-12 13:58 PDT by Mike Connor [:mconnor]
Modified: 2011-06-09 14:58 PDT (History)
11 users (show)
mconnor: wanted‑next+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
fake time patch for testing on linux (736 bytes, patch)
2008-05-21 10:00 PDT, Kai Engert (:kaie)
no flags Details | Diff | Review
(minimal) Patch v1 (1.34 KB, patch)
2008-05-21 12:24 PDT, Kai Engert (:kaie)
nelson: review+
Details | Diff | Review
set PR_Now() back two weeks on Windows (not for checkin) (587 bytes, patch)
2008-05-21 12:35 PDT, Nelson Bolyard (seldom reads bugmail)
no flags Details | Diff | Review
(cleaner) Patch v2 (6.59 KB, patch)
2008-05-21 12:38 PDT, Kai Engert (:kaie)
nelson: review+
Details | Diff | Review
(minimal) Patch v1 b (1.36 KB, patch)
2008-05-21 13:21 PDT, Kai Engert (:kaie)
kaie: review+
rrelyea: superreview+
shaver: approval1.9+
Details | Diff | Review

Description Mike Connor [:mconnor] 2008-05-12 13:58:48 PDT
Got a new PC, system date was off by a number of days.  On (seemingly) EV sites only, but could be anything with OCSP, I would get crashes on pageload OR viewing the cert OR on shutdown.  The EV UI would also fail to display. beltzner did some poking, looks like when its > 2 days off its 100% reproducible.  Seems like the type of thing we should just fix, since its entirely non-obvious what's causing it.  Probably also want a technote on it for SUMO...

Some stacks:

246f66de-2064-11dd-a8ec-001cc45a2ce4
767c7626-2062-11dd-ada7-0013211cbf8a
9dcd091c-2061-11dd-a2f9-001321b13766
dd4ee885-2060-11dd-b536-001cc4e2bf68
5be344ae-2060-11dd-9c05-001cc4e2bf68
ef36b0ca-205f-11dd-bd8d-001cc4e2bf68
Comment 1 Mike Beltzner [:beltzner, not reading bugmail] 2008-05-12 14:00:53 PDT
More stacks:

http://crash-stats.mozilla.com/report/index/71c5df1e-2065-11dd-abbc-0013211cbf8a
http://crash-stats.mozilla.com/report/index/95bd741c-2065-11dd-a7d8-001321b13766

The stacks always look the same:

0  	mozcrt19.dll  	memcmp  	
1 	nssutil3.dll 	SECITEM_CompareItem_Util 	mozilla/security/nss/lib/util/secitem.c:165
2 	nss3.dll 	ocsp_CacheKeyCompareFunction 	mozilla/security/nss/lib/certhigh/ocsp.c:324
3 	plds4.dll 	PL_HashTableRawLookup 	mozilla/nsprpub/lib/ds/plhash.c:200
4 	nss3.dll 	nss3.dll@0xa101f 	

Implies that the problem is here:

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/util/secitem.c&rev=1.14&mark=165#165
Comment 2 Mike Beltzner [:beltzner, not reading bugmail] 2008-05-12 14:06:00 PDT
I should note that the crash only happened to me when I tried to view the certificate, though mconnor claims that he crashed at other times.
Comment 3 Nelson Bolyard (seldom reads bugmail) 2008-05-12 15:52:39 PDT
Kai, This crash implies that the pointers and/or lengths in the OCSP cache
are bogus.  The function that crashes is well protected against NULL pointers
and zero lengths, but bogus non-NULL pointers will crash it.  
Since you wrote the OCSP cache code, please take a look.
Comment 4 Nelson Bolyard (seldom reads bugmail) 2008-05-12 21:30:07 PDT
Mike & Mike,  When you say "off by > 2 days", 
is that system clock in the past? or the future?

Kai, I'm guessing that some cache entry has been freed, or otherwise 
overwritten, while the cache's PLHashTable still points to it.
Comment 5 Mike Connor [:mconnor] 2008-05-12 22:41:13 PDT
In the past.  I was set to May 3rd on the new machine, never noticed (and, oddly enough, the internet time sync was failing)
Comment 6 Chris Ilias [:cilias] 2008-05-13 11:55:35 PDT
user-doc-complete
<http://support.mozilla.com/en-US/kb/Firefox+crashes#Your_system_clock_is_not_correct>
Comment 7 Nelson Bolyard (seldom reads bugmail) 2008-05-13 14:12:40 PDT
Chris, when I go to the URL you gave in comment 6, I get a page that has 
"Your system clock is cnot correct" in the page outline at the top, 
but no corresponding section appears in the body of the document. 

In the page outline, that heading appears between "nsBrowserOpt.dll" and
"Other causes", but in the body of the page, the section "Other causes"
follows "nsBorwsefrOpt.dll" immediately with no intervening section about
the system clock.  

An examination of the document source shows that there is some stuff there
in the source, between those two sections, but none of it displays on the 
browser page window.  
Comment 8 Jesse Ruderman 2008-05-20 17:40:10 PDT
[@ memcmp] is topcrash #6 for Firefox 3 RC1.  I picked five of them at random and they were all being called by SECITEM_CompareItem_Util.

http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.0
Comment 9 Nelson Bolyard (seldom reads bugmail) 2008-05-20 18:20:48 PDT
Mike & Mike (or anybody who has reproduced it):
Do you have one or more specific URLs with which to reproduce this?
Comment 10 Johnathan Nightingale [:johnath] 2008-05-21 06:00:35 PDT
(In reply to comment #9)
> Mike & Mike (or anybody who has reproduced it):
> Do you have one or more specific URLs with which to reproduce this?

STR:

1) Set system clock back 2 weeks
2) Open Firefox
3) Visit https://www.godaddy.com
  - Note that, since the OCSP response failed to validate, being out of range, you do not see EV treatment, just normal SSL
4) Page Info->Security->View Certificate
5) Crash
Comment 11 Johnathan Nightingale [:johnath] 2008-05-21 08:50:58 PDT
Does the work in bug 433594 shed any light on this?  I recognize that the code in question is in a different location, but given that both bugs concern code that deals with OCSP checks (and their failures) I thought it would be particularly smashing if one fix covered both, or at least lead to an obvious fix here as well.
Comment 12 Kai Engert (:kaie) 2008-05-21 10:00:31 PDT
Created attachment 321965 [details] [diff] [review]
fake time patch for testing on linux

Sorry that it took me so long until I've started to look at this bug.
I'm able to reproduce.
FWIW, instead of changing system time, this little patch also helps me to reproduce the crash.
Comment 13 Nelson Bolyard (seldom reads bugmail) 2008-05-21 12:18:49 PDT
Comment on attachment 321965 [details] [diff] [review]
fake time patch for testing on linux

This patch is a keeper!
Thanks Kai
Comment 14 Kai Engert (:kaie) 2008-05-21 12:24:58 PDT
Created attachment 321986 [details] [diff] [review]
(minimal) Patch v1 

This patch fixes the crash for me.

The crash is caused by destroying a pointer while still holding to it elsewhere.
- pkix_pl_OcspResponse_GetStatusForCert derives the status from the response,
  but at the same time creates a response object.
  It correctly sets the flag that our certID object got consumed for
  the new cache entry
- Function PKIX_PL_OcspCertID_RememberOCSPProcessingFailure
  was meant to be called only when we have not yet created a cache entry.
  The code makes the incorrect assumption that (!passed) indicates
  "no cache entry created yet". That's wrong. And we don't check whether
  the certID got already consumed. (It was.)
- We call PKIX_PL_OcspCertID_RememberOCSPProcessingFailure,
  which reuses the same flag variable for the consumption status,
  which gets set to FALSE when some function deeper down in OCSP processing
  prepares the out parameter
- no second cache entry gets created (correct), but the flag "was consumed"
  incorrectly got reverted to "no"
- we destroy the certID object, although it's still in possession by the 
  cache
- we crash on the next operation that accesses the cache

This bug got introduced when I cache-enabled libpkix (wrapper code around the OCSP cache).
Comment 15 Nelson Bolyard (seldom reads bugmail) 2008-05-21 12:35:30 PDT
Created attachment 321988 [details] [diff] [review]
set PR_Now() back two weeks on Windows (not for checkin)
Comment 16 Kai Engert (:kaie) 2008-05-21 12:38:12 PDT
Created attachment 321990 [details] [diff] [review]
(cleaner) Patch v2

While the previous comment lists a minimal patch, Nelson has complained about the concept of using a decoupled "pointer was consumed" variable, and I have to agree. He proposes to simply set the pointer variable to NULL as soon as we no longer own the pointer.

Here is an alternative patch, which is probably cleaner, but also larger.

So the question is:
Should we take the minimal patch or the cleaner patch for NSS 3.12.0 / Firefox 3.0.0 ?
Comment 17 Nelson Bolyard (seldom reads bugmail) 2008-05-21 12:55:40 PDT
Kai's comment 14 suggests that this is another manifestation of bug 435007.
The "(minimal) patch v1", attachment 321986 [details] [diff] [review], above conflicts with the patch 
attached to that bug, which (I'm sure) is why Kai didn't request review for 
this patch.

The second patch attached here appears to be a merger of the two patches.
I will review it.
Comment 18 Nelson Bolyard (seldom reads bugmail) 2008-05-21 13:06:27 PDT
Comment on attachment 321990 [details] [diff] [review]
(cleaner) Patch v2

This patch appears to be correct, and is what we want for NSS 3.12.1 and beyond.  
However, if we're going to spin another NSS 3.12.0 release candidate for FF3 RC2, Maybe the "minimal patch v1" is less risk for that.  Let's ask Bob.

Also note that if we respin another NSS 3.12.0 Release Candidate, we MUST also pick up the fix for bug 433594.
Comment 19 Nelson Bolyard (seldom reads bugmail) 2008-05-21 13:13:35 PDT
Comment on attachment 321986 [details] [diff] [review]
(minimal) Patch v1 


>-        if (!passed && cid) {
>+        if (!passed && cid && !cid->certIDWasConsumed) {

Actually, it would be much safer to use

>+        if (!passed && cid && cid->certID && !cid->certIDWasConsumed) {

Please do that.  r+ for the 3.12.0 branch with that additional change.
I'm asking Bob for SR for that branch.
Since either of the two patches to this bug depend on the patch for 
bug 433594, I'm asking Bob to SR that bug's patch, too.
Comment 20 Nelson Bolyard (seldom reads bugmail) 2008-05-21 13:14:52 PDT
Marking this bug as depending on bug 433594, since the fix to that bug
is pre-requisite to the patch(es) for this bug.
Comment 21 Kai Engert (:kaie) 2008-05-21 13:21:34 PDT
Created attachment 321998 [details] [diff] [review]
(minimal) Patch v1 b

Adding Nelson's requested change, keeping Nelsons review flags.
Comment 22 Nelson Bolyard (seldom reads bugmail) 2008-05-21 13:31:16 PDT
(In reply to comment #14)

> - We call PKIX_PL_OcspCertID_RememberOCSPProcessingFailure,
>   which reuses the same flag variable for the consumption status,
>   which gets set to FALSE when some function deeper down in OCSP processing
>   prepares the out parameter
> - no second cache entry gets created (correct), but the flag "was consumed"
>   incorrectly got reverted to "no"

The patch that eliminates the "was consumed" flag presumably fixes the 
problems where the flag's value was incorrect.  Does the problem of 
incorrect values in "was consumed" flag remain with the "minimal patch v1" ?
If the flag is set incorrectly, then adding tests for the value of that 
flag in various places won't actually fix things. 

> - we destroy the certID object, although it's still in possession by the 
>   cache
> - we crash on the next operation that accesses the cache
> 
> This bug got introduced when I cache-enabled libpkix (wrapper code around the
> OCSP cache).

If "minimal patch v1" doesn't fix that, I must withdraw my r+ from it.
Comment 23 Kai Engert (:kaie) 2008-05-21 13:40:45 PDT
Nelson, the minimal patch avoids the second function call, if our object got already consumed.
It's the second function call that may revert the flag to an incorrect value.

I think the minimal patch is fine.
Comment 24 Robert Relyea 2008-05-21 16:44:57 PDT
Comment on attachment 321998 [details] [diff] [review]
(minimal) Patch v1 b

r+ yes, less take this one in 3.12.0

the other is definately appropriate for 3.12.1

bob
Comment 25 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-05-28 10:46:28 PDT
Comment on attachment 321998 [details] [diff] [review]
(minimal) Patch v1 b

a=shaver for landing on Firefox-destined branch, thanks.
Comment 26 Kai Engert (:kaie) 2008-05-28 11:05:57 PDT
checked in to 3.12.0 branch for rc4

Checking in libpkix/pkix/checker/pkix_ocspchecker.c;
/cvsroot/mozilla/security/nss/lib/libpkix/pkix/checker/pkix_ocspchecker.c,v  <--  pkix_ocspchecker.c
new revision: 1.8.2.1; previous revision: 1.8
done
Comment 27 Kai Engert (:kaie) 2008-05-29 10:11:55 PDT
checked in the cleaner patch to 3.12 trunk for 3.12.1
marking fixed

Checking in pkix/checker/pkix_ocspchecker.c;
/cvsroot/mozilla/security/nss/lib/libpkix/pkix/checker/pkix_ocspchecker.c,v  <--  pkix_ocspchecker.c
new revision: 1.9; previous revision: 1.8
done
Checking in pkix_pl_nss/pki/pkix_pl_ocspcertid.c;
/cvsroot/mozilla/security/nss/lib/libpkix/pkix_pl_nss/pki/pkix_pl_ocspcertid.c,v  <--  pkix_pl_ocspcertid.c
new revision: 1.4; previous revision: 1.3
done
Checking in pkix_pl_nss/pki/pkix_pl_ocspcertid.h;
/cvsroot/mozilla/security/nss/lib/libpkix/pkix_pl_nss/pki/pkix_pl_ocspcertid.h,v  <--  pkix_pl_ocspcertid.h
new revision: 1.2; previous revision: 1.1
done
Checking in pkix_pl_nss/pki/pkix_pl_ocspresponse.c;
/cvsroot/mozilla/security/nss/lib/libpkix/pkix_pl_nss/pki/pkix_pl_ocspresponse.c,v  <--  pkix_pl_ocspresponse.c
new revision: 1.10; previous revision: 1.9
done
Comment 28 Nelson Bolyard (seldom reads bugmail) 2008-06-02 12:50:09 PDT
*** Bug 436903 has been marked as a duplicate of this bug. ***
Comment 29 chris hofmann 2008-06-02 14:16:05 PDT
as mentioned in bug 436903 this bug moved to #6 topcrash on RC1.  we should watch for a reduction on the next release candidate.

Note You need to log in before you can comment on or make changes to this bug.