Closed Bug 700499 Opened 13 years ago Closed 5 years ago

Crash in nsNSSComponent::ShutdownNSS @ IssuerCache_Destroy

Categories

(NSS :: Libraries, defect)

defect
Not set
critical

Tracking

(firefox16 affected, firefox17 affected, firefox18 affected, firefox19 affected, b2g18 affected)

RESOLVED WORKSFORME
Tracking Status
firefox16 --- affected
firefox17 --- affected
firefox18 --- affected
firefox19 --- affected
b2g18 --- affected

People

(Reporter: nhirata, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, Whiteboard: [mobile-crash][native-crash][b2g-crash])

Crash Data

This bug was filed from the Socorro interface and is 
report bp-7ee70147-0000-4c56-90e8-eae642111101 .
============================================================= 

Crashing Thread
Frame 	Module 	Signature [Expand] 	Source
0 	libnss3.so 	nssCertificate_Destroy 	certificate.c:142
1 	libnss3.so 	NSSCertificate_Destroy 	certificate.c:182
2 	libnss3.so 	CERT_DestroyCertificate 	stanpcertdb.c:823
3 	libnss3.so 	IssuerCache_Destroy 	crl.c:1162
4 	libnss3.so 	FreeIssuer 	crl.c:1275
5 	libplds4.so 	PL_HashTableEnumerateEntries 	nsprpub/lib/ds/plhash.c:406
6 	libnss3.so 	ShutdownCRLCache 	crl.c:1340
7 	libnss3.so 	nss_Shutdown 	nssinit.c:1098
8 	libnss3.so 	NSS_Shutdown 	nssinit.c:1156
9 	libxul.so 	nsNSSComponent::ShutdownNSS 	security/manager/ssl/src/nsNSSComponent.cpp:1918
10 	libxul.so 	nsNSSComponent::DoProfileBeforeChange 	security/manager/ssl/src/nsNSSComponent.cpp:2597
11 	libxul.so 	nsNSSComponent::Observe 	security/manager/ssl/src/nsNSSComponent.cpp:2231
12 	libxul.so 	nsObserverList::NotifyObservers 	xpcom/ds/nsObserverList.cpp:130
13 	libxul.so 	nsObserverService::NotifyObservers 	xpcom/ds/nsObserverService.cpp:182
14 	libxul.so 	nsXREDirProvider::DoShutdown 	toolkit/xre/nsXREDirProvider.cpp:814
15 	libxul.so 	ScopedXPCOMStartup::~ScopedXPCOMStartup 	toolkit/xre/nsAppRunner.cpp:1104
16 	libxul.so 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3600
17 	libxul.so 	Java_org_mozilla_gecko_GeckoAppShell_nativeRun 	toolkit/xre/nsAndroidStartup.cpp:132
18 	libmozutils.so 	Java_org_mozilla_gecko_GeckoAppShell_nativeRun 	other-licenses/android/APKOpen.cpp:232
19 	libdvm.so 	libdvm.so@0x11c76 	
20 	dalvik-LinearAlloc (deleted) 	dalvik-LinearAlloc @0x207a56 	
21 	dalvik-heap (deleted) 	dalvik-heap @0x7582d6 	
22 	libdvm.so 	libdvm.so@0x41185 	
23 	data@app@org.mozilla.fennec-2.apk@classes.dex 	data@app@org.mozilla.fennec-2.apk@classes.dex@0x14400 	
24 	libmozutils.so 	Java_org_mozilla_gecko_GeckoAppShell_nativeInit 	other-licenses/android/APKOpen.cpp:231
25 	dalvik-LinearAlloc (deleted) 	dalvik-LinearAlloc @0x207a56 	
26 	libdvm.so 	libdvm.so@0x4113b 	
27 	dalvik-heap (deleted) 	dalvik-heap @0x7582d6 	
28 	libdvm.so 	libdvm.so@0x46787 	
29 	dalvik-LinearAlloc (deleted) 	dalvik-LinearAlloc @0x207a56 	
30 	data@app@org.mozilla.fennec-2.apk@classes.dex 	data@app@org.mozilla.fennec-2.apk@classes.dex@0xceb4 	
31 	dalvik-heap (deleted) 	dalvik-heap @0x7582d6 	
32 	libdvm.so 	libdvm.so@0x11e3e 	
33 	libdvm.so 	libdvm.so@0x16e9e 	
34 	libdvm.so 	libdvm.so@0x1bd5e 	
35 	libdvm.so 	libdvm.so@0x1bcce 	
36 	dalvik-LinearAlloc (deleted) 	dalvik-LinearAlloc @0x20903e 	
37 	libdvm.so 	libdvm.so@0x1ae12 	
38 	libdvm.so 	libdvm.so@0x16b1a 	
39 	core.odex 	core.odex@0x8a3ce 	
40 	dalvik-heap (deleted) 	dalvik-heap @0x75b4a6 	
41 	dalvik-LinearAlloc (deleted) 	dalvik-LinearAlloc @0x207f02 	
42 	dalvik-mark-stack (deleted) 	dalvik-mark-stack @0x4e0bf6e 	
43 	libdvm.so 	libdvm.so@0x9ef76 	
44 	libdvm.so 	libdvm.so@0x16b7e 	
45 	libdvm.so 	libdvm.so@0x16bf6 	
46 	libdvm.so 	libdvm.so@0x16a9e 	
47 	libdvm.so 	libdvm.so@0x16ac6 	
48 	libdvm.so 	libdvm.so@0x16b1a
I ended up crashing on the LG Revolution with this.
https://crash-stats.mozilla.com/report/index/b4aefe6e-ed8f-497a-bb64-1d2aa2120209

I believe that the STRs are : 
1. go to https://secure.hulu.com/plus/mobile
2. timed out on my connection; device went to sleep
3. wake up and quit

Regression on bug 182803?

I can't seem to repro at this moment in time.
Whiteboard: [mobile-crash] → [mobile-crash][native-crash]
Version: Other Branch → unspecified
It's #3 top crasher in Fennec 11.0b3.
Keywords: topcrash
It's fallen down to the 69th crash in Aurora.  Not on the chart for Nightly.  Not sure how many people are hitting SSL sites with Aurora and Nightly.  Might pick up in Beta or release at that point.  Removing topcrash keyword.
Keywords: topcrash
The SSL component is the same for XUL and Android Fennec.

It's #28 top browser crasher in Fennec 10.0.2 and #2 in Fennec 11.0b5.
See bug 726028 comment 4 for the regression range in 11.0.
Keywords: regression, topcrash
Version: unspecified → 11 Branch
Component: Security → Security: PSM
QA Contact: toolkit → psm
#72 in 11.0 release now, so I'm removing topcrash, but it's still significant, so I'd hope someone would take a look.
Keywords: topcrash
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #6)
> #72 in 11.0 release now, so I'm removing topcrash, but it's still
> significant, so I'd hope someone would take a look.
It's about bug 726028 which is the desktop version of this crash.

As there's no XUL 11.0, I agree that the topcrash keyword can be removed, but it has a good chance to be a topcrasher once Native 13.0 is released.
It's #2 top crasher in XUL Fennec 12.0b2.
Keywords: topcrash
tracking-fennec: --- → ?
This can occur in Native Fennec.  bug 726028 is most likely a duplicate.
https://crash-stats.mozilla.com/report/index/f3946300-7f7a-4b2b-8f01-3de572120408

Still a top crash within Fennec XUL 12.0b3
Hey bsmith. Any insights on this one?
https://crash-stats.mozilla.com/report/list?signature=nssCertificate_Destroy - this is cross-product, also happening on Firefox desktop 11.0 (quite a bit) and FennecAndroid 14.0a1
It's no longer a top crasher in XUL Fennec 13.0b1 (no crash) while it was #2 in 12.0b6.
tracking-fennec: ? → ---
Keywords: topcrash
Blocks: 761987
It's #38 top crasher in 14.0b7, #13 in 15.0a2 and #33 in 16.0a1.

The stack sometimes looks like:
Frame 	Module 	Signature 	Source
0 	libnss3.so 	nssCertificate_Destroy 	certificate.c:134
1 	libnss3.so 	NSSCertificate_Destroy 	certificate.c:182
2 	libnss3.so 	CERT_DestroyCertificate 	stanpcertdb.c:823
3 	libssl3.so 	ssl3_CleanupPeerCerts 	ssl3con.c:7842
4 	libssl3.so 	ssl3_DestroySSL3Info 	ssl3con.c:9582
5 	libssl3.so 	ssl_DestroySocketContents 	sslsock.c:410
6 	libssl3.so 	ssl_FreeSocket 	sslsock.c:482
7 	libssl3.so 	ssl_DefClose 	ssldef.c:233
8 	libssl3.so 	ssl_SecureClose 	sslsecur.c:1094
9 	libssl3.so 	ssl_Close 	sslsock.c:1727
10 	libxul.so 	nsNSSSocketInfo::CloseSocketAndDestroy 	security/manager/ssl/src/nsNSSIOLayer.cpp:1825
11 	libxul.so 	nsSSLIOLayerClose 	security/manager/ssl/src/nsNSSIOLayer.cpp:1815
12 	libnspr4.so 	PR_Close 	nsprpub/pr/src/io/priometh.c:136
13 	libxul.so 	nsSocketTransport::ReleaseFD_Locked 	netwerk/base/src/nsSocketTransport2.cpp:1441
14 	libxul.so 	nsSocketTransport::OnSocketDetached 	netwerk/base/src/nsSocketTransport2.cpp:1684
15 	libxul.so 	nsSocketTransportService::DetachSocket 	netwerk/base/src/nsSocketTransportService2.cpp:214
16 	libxul.so 	nsSocketTransportService::DoPollIteration 	netwerk/base/src/nsSocketTransportService2.cpp:785
17 	libxul.so 	nsSocketTransportService::Run 	netwerk/base/src/nsSocketTransportService2.cpp:645
18 	libxul.so 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:656
19 	libxul.so 	NS_ProcessNextEvent_P 	obj-firefox/xpcom/build/nsThreadUtils.cpp:245
20 	libxul.so 	nsThread::ThreadFunc 	xpcom/threads/nsThread.cpp:289
21 	libnspr4.so 	_pt_root 	nsprpub/pr/src/pthreads/ptthread.c:187
22 	libc.so 	__thread_entry 	
23 	libc.so 	pthread_create 	

More reports at:
https://crash-stats.mozilla.com/report/list?signature=nssCertificate_Destroy
https://crash-stats.mozilla.com/report/list?signature=nssCertificateStore_Lock
Crash Signature: [@ nssCertificate_Destroy] → [@ nssCertificate_Destroy] [@ nssCertificateStore_Lock]
It's #12 top crasher in 15.0b3.
It's #10 top crasher in 15.0.
Keywords: topcrash
See Also: → 726028, 433108, 428038
tracking-fennec: --- → ?
I believe that this crash occurs for the crash reporter after sending a crash report... I could be mistaken.
Crashes in NSS should be assigned to the NSS library product (unless you have reason to believe the crash is caused by e.g. memory corruption in the application code).
tracking-fennec: ? → ---
Assignee: nobody → nobody
Component: Security: PSM → Libraries
Product: Core → NSS
Version: 11 Branch → unspecified
Keywords: regression
OS: Android → All
Hardware: ARM → All
See Also: 726028
This is also affecting B2G: bp-fe32f889-de72-4686-9edb-d82b62121205
There are (at least) two types of crashes that end in nssCertificate_Destroy or nssCertificateStore_Lock: one that occurs when we're destroying the issuer cache during NSS Shutdown (this bug), and one that occurs when we're tearing down an SSL socket (bug 791194). I'm tracking them separately for now, though they could both end up having the same underlying cause.
Crash Signature: [@ nssCertificate_Destroy] [@ nssCertificateStore_Lock] → [@ nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | IssuerCache_Destroy] [@ nssCertificateStore_Lock | nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | IssuerCache_Destroy]
Summary: crash nssCertificate_Destroy → Crash [@ nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | IssuerCache_Destroy][@ nssCertificateStore_Lock | nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | IssuerCache_Destroy]
(In reply to Brian Smith (:bsmith) from comment #21)
> There are (at least) two types of crashes

I fear that the crash signatures you changed those to are not something that crash-stats is seeing, so you make all those crashes appear there as if no bug is connected to them. So, please just add the same signature actually recorded by crash-stats to both bugs - and if we should actually end up there with signatures like you changed them to here, you need to file a bug against Socorro to add the top frames to the prefix "skiplist".
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #22)
> (In reply to Brian Smith (:bsmith) from comment #21)
> > There are (at least) two types of crashes
> 
> I fear that the crash signatures you changed those to are not something that
> crash-stats is seeing, so you make all those crashes appear there as if no
> bug is connected to them. So, please just add the same signature actually
> recorded by crash-stats to both bugs - and if we should actually end up
> there with signatures like you changed them to here, you need to file a bug
> against Socorro to add the top frames to the prefix "skiplist".

OK. Thanks for reminding me about this. It seems like I relearn this every time I work on a crash bug.
Crash Signature: [@ nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | IssuerCache_Destroy] [@ nssCertificateStore_Lock | nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | IssuerCache_Destroy] → [@ nssCertificate_Destroy] [@ nssCertificateStore_Lock]
Summary: Crash [@ nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | IssuerCache_Destroy][@ nssCertificateStore_Lock | nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | IssuerCache_Destroy] → Crash [@ nssCertificate_Destroy] [@ nssCertificateStore_Lock]
It's #12 top crasher in 17.0.
Crash Signature: [@ nssCertificate_Destroy] [@ nssCertificateStore_Lock] → [@ nssCertificate_Destroy] [@ nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | IssuerCache_Destroy ] [@ nssCertificateStore_Lock] [@ nssCertificateStore_Lock | nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCe…
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #24)
> Firefox for Android now has those two signatures - should new bugs be filed
> for those?
I filed bug 827264 for the stack trace in comment 13.
Crash Signature: CERT_DestroyCertificate | IssuerCache_Destroy] → CERT_DestroyCertificate | IssuerCache_Destroy] [@ PORT_FreeArena_Util | CERT_DestroyCertificate | IssuerCache_Destroy] [@ PR_AtomicDecrement | nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | IssuerCache_Destroy] [@ PR_Lock |…
Summary: Crash [@ nssCertificate_Destroy] [@ nssCertificateStore_Lock] → Crash in nsNSSComponent::ShutdownNSS @ IssuerCache_Destroy
No longer blocks: 830046
Assignee: nobody → rdow
This is #15 on 18.0.2 right now.

bsmith, is there anything we can do to mitigate/reduce those (or at least find out where the culprit lies)?
My opinion is that we have a data overwrite (probably not observing proper locking protocol) somewhere. I am working on a patch - which will only allow us to localize the problem at the moment.
Conversation between Kai Engert (:kaie) and Rand Dow (:randix)

:randix 2/16/2013 11:19 PST
> These NSS_Shutdown bugs look like memory corruption, probably a misuse
> or lack of use of locking at some point in Firefox.

:kaie 2/18/2013 12:18 PST
That's possible. When I started to work on that code, I had suggested to
hunt for incorrect threading code in 2001 (11.5 years ago) in bug
https://bugzilla.mozilla.org/show_bug.cgi?id=101005
but I had never time to focus on that, nor did anyone else attempt to
make such research a priority. Instead, things got attempted to get
fixed over time.

:randix
> Finding such things by code examination (and I'm new to the code base)
> is usually impossible. There seems to be only the Android environment
> to reliably reproduce it.
> 
> I am thinking to write a simple certificate verifier, and sprinkle a
> call to this into the code and then run this in our test environment. I
> would hope to get a crash much closer to the actual corrupting event,
> and with a bit of trial and error localize it to right after it
> happens.
> 
> So: my question: Is there a list somewhere of "all" certificates?

kaie:
You are looking for the full list of certificates maintained by NSS? 

In nsNSSCertCache::CacheAllCerts()
I found:
 CERTCertList *newList = PK11_ListCerts(PK11CertListUnique, cxt);

:randix
> I haven't found it. I have found the crl cache (I think) and
> PL_HashTableEnumberatorEntries() which I might be able to use, and I
> have found the ShutdownList. 
> 
> Do you have a suggestion how to get a straight-forward list of all
> current certificates at any given point in time, and if there is a
> single lock that I can grab to be safe while traversing it?

:kaie
That function will give you a list where each entry has had their
reference counter incremented, now owned by your list.

I doubt the problem is inside NSS code (the general purpose C library).

The problem is probably at the PSM level (Mozilla's application code
layer that operates on top of NSS).

In order to safely shutdown NSS, any resources obtained from NSS by the
application must be correctly freed. And that's tricky to track from
within Mozilla code, because of the wrapper objects around NSS pointers
(such as nsNSSCertificate) and the fact that their cleanup is owned by
the delayed garbage collection mechanisms of the JavaScript engine.

Before you shut down, you should ensure that all JS and C++ wrapper
objects have been correctly cleaned up, and that no thread is holding on
to anything.

This was difficult to get right. In the past, we had a project where
this had to be 100% correct, because we had an even harder requirement -
we had to shutdown and successfully reinitiatlize NSS afterwards,
without restarting the process.

That's why I had introduced the nsNSSShutDownList, which helped me to
find bugs in the past to track down some of the orphaned pointers.

After that old requirement got dropped, this functionality no longer got
kept up to date. Maybe it could help you to bring this old code back to
being complete, by tracking all referenced objects.
Whiteboard: [mobile-crash][native-crash] → [mobile-crash][native-crash][b2g-crash]
With combined signatures, it's #15 top crasher in 19.0.2, not enough to qualify it for the topcrash keyword according to https://wiki.mozilla.org/CrashKill/Topcrash
Keywords: topcrash
Crash Signature: | nssCertificateStore_Lock | nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | IssuerCache_Destroy] → | nssCertificateStore_Lock | nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | IssuerCache_Destroy] [@ jemalloc_crash | arena_dalloc | free | PR_Free | PORT_ZFree_Util | PORT_FreeArena_Util | CERT_DestroyCertificate | IssuerCac…
Assignee: rdow → doug.turner
Assignee: doug.turner → mhamrick
Assignee: mhamrick → nobody
Crash Signature: [@ nssCertificate_Destroy] [@ nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | IssuerCache_Destroy ] [@ nssCertificateStore_Lock] [@ nssCertificateStore_Lock | nssCertificate_Destroy | NSSCertificate_Destroy | → [@ nssCertificate_Destroy] [@ nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | IssuerCache_Destroy ] [@ nssCertificateStore_Lock] [@ nssCertificateStore_Lock | nssCertificate_Destroy | NSSCertificate_Destroy |
(In reply to Scoobidiver (away) from comment #30)
> With combined signatures, it's #15 top crasher in 19.0.2, not enough to
> qualify it for the topcrash keyword according to
> https://wiki.mozilla.org/CrashKill/Topcrash

currently far less than that. In fact, rare. For version 42.0, roughly 120 crashes per week.
https://crash-stats.mozilla.com/search/?signature=~nssCertificate_Destroy&date=%3E2015-12-01&version=42.0&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=platform&_columns=email&_columns=user_comments#crash-reports
Depends on: 101005

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
Restrict Comments: true
You need to log in before you can comment on or make changes to this bug.