Closed Bug 435496 Opened 16 years ago Closed 6 years ago

crash [@ ocsp_CacheKeyHashFunction]

Categories

(NSS :: Libraries, defect, P1)

3.12
x86
Windows XP
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: MatsPalmgren_bugz, Unassigned)

References

Details

(Keywords: crash)

Crash Data

"Top Crashers for Firefox 3.0" currently has this crash at #21.

A few crash IDs:
bp-c64537fd-266d-11dd-a520-001cc45a2c28
bp-db190d47-25f2-11dd-9d85-001cc45a2ce4
bp-73c6bb97-25a4-11dd-84b4-0013211cbf8a

A couple of example stacks:

ocsp_CacheKeyHashFunction	 mozilla/security/nss/lib/certhigh/ocsp.c:300
PL_HashTableRemove	mozilla/nsprpub/lib/ds/plhash.c:374
nss3.dll@0xa101f	
nsNSSComponent::DoProfileBeforeChange	mozilla/security/manager/ssl/src/nsNSSComponent.cpp:2388
nsNSSComponent::Observe	mozilla/security/manager/ssl/src/nsNSSComponent.cpp:1972
nsObserverService::NotifyObservers	mozilla/xpcom/ds/nsObserverService.cpp:181
xul.dll@0x7b88d3	
xul.dll@0x7b89a3	

==================================

ocsp_CacheKeyHashFunction	
PL_HashTableRemove	mozilla/nsprpub/lib/ds/plhash.c:374
ocsp_RemoveCacheItem	
CERT_ClearOCSPCache	
OCSP_ShutdownGlobal	
NSS_Shutdown	
nsNSSComponent::ShutdownNSS	mozilla/security/manager/ssl/src/nsNSSComponent.cpp:1690
nsNSSComponent::DoProfileBeforeChange	mozilla/security/manager/ssl/src/nsNSSComponent.cpp:2388
nsNSSComponent::Observe	mozilla/security/manager/ssl/src/nsNSSComponent.cpp:1972
nsObserverList::NotifyObservers	mozilla/xpcom/ds/nsObserverList.cpp:128
nsObserverService::NotifyObservers	mozilla/xpcom/ds/nsObserverService.cpp:181
nsXREDirProvider::DoShutdown	mozilla/toolkit/xre/nsXREDirProvider.cpp:853
ScopedXPCOMStartup::~ScopedXPCOMStartup	mozilla/toolkit/xre/nsAppRunner.cpp:905
XRE_main	mozilla/toolkit/xre/nsAppRunner.cpp:3211
main	mozilla/browser/app/nsBrowserApp.cpp:158
Odds are very good that this bug has the same cause as bug 433594, and 
that the fix for bug 433386 will also cure this crash, but to be safe, 
we shuold also insert a NULL pointer check into ocsp_CacheKeyHashFunction.

As an aside, I really don't understand why PL_HashTableRemove is calling
ocsp_CacheKeyHashFunction here.  That seems wrong.  I want to understand
that before declaring that a NULL check in ocsp_CacheKeyHashFunction is 
the right fix.
After looking at this further, I no longer think that avoiding a NULL ptr 
dereference in ocsp_CacheKeyHashFunction will solve any problem.  The 
problem is probably not a NULL ptr but rather a stale ptr.

This hash function returns a value that is used to find a "hash bucket"
that is the head of a chain of objects of this type.  All return values 
from this hash function are valid. There is no value that this function
can return that means "not found" or "no valid hash for this object". 

Obviously this function is being called with a stale certID handle,
which means that a certID cache entry contains a pointer to a stale 
certID.  A certID handle has been put into the cache, and then that 
certID has been destroyed.  

The only real fix to this problem is to avoid calling PL_HashTableRemove 
with a stale certID handle/ptr.  It wouldn't hurt to enclose the call to 
PL_HashTableRemove inside ocsp_RemoveCacheItem with a NULL test 
(item->certID != NULL) but that won't detect a stale cache entry.
Since there's no good way to detect that a certID handle is stale, we must prevent cache entries from becoming stale.

I think bug 433386 is one known cause of that problem.  
Kai, can you confirm that?

If we knew how to cause this particular crash, and could reproduce it,  
we could test whether the fix for bug 433386 fixes this crash too, or not.
I think I cc'ed the wrong kai. :(
(In reply to comment #2)
> If we knew how to cause this particular crash, and could reproduce it,  

Most of the crash report comments mentioned "startup" or "quit".
Yes, the second stack trace above clearly happens during shutdown.
Some code path has apparently destroyed an object to which the cache 
still holds a pointer.  The question is: what series of operations 
(e.g. a visit to what URL) leaves the ocsp certID cache in that state.

I think the solution to this may involve making the certID object be 
a reference counted object.  The present strategy, where a certID 
object can only have one "owner" who holds the sole reference to the 
object, seems doomed.  I can see that it would work when creating 
the certID object, and putting it into the cache.  Once the reference
is put into the cache, the caller no longer needs to hold its copy.
But when find the certID in the cache, you clearly want to have two
references to it, one that remains in the cache, and another than is
returned to the caller of the cache lookup.  I think the present design 
just doesn't handle that important and most common case at all well.
But I could be wrong.
This bug and bug 428746 probably have the same cause.
Keywords: topcrash
Whiteboard: possibly fixed in rc2 by bug 433386
I wonder if this bug is a side effect of the cause of bug 433594, which is
now fixed.
Priority: -- → P2
Target Milestone: --- → 3.12.1
Priority: P2 → P1
Having heard nothing new about this bug in 9 months, I gather it is no 
longer an issue.  If it is, we need new evidence.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WORKSFORME
There are 215 reported Firefox crashes in the past 4 weeks with this signature.
http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=ocsp_CacheKeyHashFunction&date=&range_value=4&range_unit=weeks&do_query=1&signature=ocsp_CacheKeyHashFunction

All except two are with Firefox 3.0 rc1 or earlier.  The exceptions are:
Firefox 3.0 rc2 on Windows XP:
bp-07225492-9907-4987-a3fd-92d552090326
Firefox 3.0.8 on MacOSX 10.4.11:
bp-7af8a2a1-37fa-4287-871e-920222090401

So it seems bug 433594 made this crash much less likely to occur,
but there is evidence it still occurs with that bug fixed.
Status: RESOLVED → REOPENED
Keywords: topcrash
Priority: P1 → --
Resolution: WORKSFORME → ---
Whiteboard: possibly fixed in rc2 by bug 433386
Mats, Thanks for that additional info.  I propose that, henceforth, the scope 
of this bug will be about the crash in Firefox 3.0.8 on MacOSX 10.4.11: 
bp-7af8a2a1-37fa-4287-871e-920222090401
http://crash-stats.mozilla.com/report/index/7af8a2a1-37fa-4287-871e-920222090401
FWIW, it also occurs in Firefox 3.5 on Windows XP:
bp-13386624-3ff5-45d4-94df-df29d2090625
Assignee: nobody → julien.pierre.boogz
Priority: -- → P1
Target Milestone: 3.12.1 → 3.12.5
Version: unspecified → 3.12
Mats,
Thanks for the report in comment 11 . That one has a stack with the full debug info.

From it, I can tell that the crash is due to an invalid issuerNameHash in a CERTOCSPCertID in the OCSP cache. Either len is incorrect, or the data pointer is NULL.

I will check what can cause that situation to occur.
The only place where issuerNameHash is supposed to be set is in ocsp_CreateCertID, by a call to cert_GetSubjectNameDigest, which in turns calls ocsp_DigestValue .

Looking at all the error paths, I can't find any which would result in an invalid issuerNameHash . There are proper size checks in ocsp_DigestValue, such as the function would fail if the fill buffer was too small. This would in turn cause ocsp_CertID to fail.

My preliminary condition is thus that the OCSPCertID was somehow corrupt some time after it was created initially successfully in a valid state. It would be very helpful to dump the structure from the crash logs as is possible with a core file, but I don't know if this capability is available.
Target Milestone: 3.12.5 → ---
Bugs that are currently assigned to Julien => assigning to nobody.
Search for 20100628-kaie-jp
Assignee: bugzilla+nospam → nobody
Crash Signature: [@ ocsp_CacheKeyHashFunction]
(In reply to Mats Palmgren (:mats) from comment #9)
> There are 215 reported Firefox crashes in the past 4 weeks with this
> signature.

way rare these days - on average only a couple per version per month
attempting to contact user of bp-4a45a655-cdf3-42c1-8534-0632b2130311
Closing because no crash reported since 12 weeks.
Status: REOPENED → RESOLVED
Closed: 15 years ago6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.