Closed Bug 827264 Opened 11 years ago Closed 10 years ago

crash in nsNSSSocketInfo::CloseSocketAndDestroy @ ssl3_CleanupPeerCerts

Categories

(NSS :: Libraries, defect)

defect
Not set
critical

Tracking

(firefox17 wontfix, firefox18 wontfix, firefox19 wontfix, firefox20 wontfix, firefox21 wontfix, firefox22 wontfix, firefox23 wontfix, firefox24 wontfix, firefox25 wontfix, firefox31+ verified, firefox32+ verified, firefox33+ verified, b2g-v1.2 affected, b2g-v1.4 affected, b2g-v2.0 affected)

VERIFIED FIXED
Tracking Status
firefox17 --- wontfix
firefox18 --- wontfix
firefox19 --- wontfix
firefox20 --- wontfix
firefox21 --- wontfix
firefox22 --- wontfix
firefox23 --- wontfix
firefox24 --- wontfix
firefox25 --- wontfix
firefox31 + verified
firefox32 + verified
firefox33 + verified
b2g-v1.2 --- affected
b2g-v1.4 --- affected
b2g-v2.0 --- affected

People

(Reporter: scoobidiver, Assigned: cviecco)

References

Details

(Keywords: crash, reproducible, topcrash-android-armv7, Whiteboard: [native-crash][b2g-crash])

Crash Data

Attachments

(2 files)

It's #19 top crasher in 17.0 and #38 in 18.0b7.

Signature 	nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | ssl3_CleanupPeerCerts | ssl3_DestroySSL3Info | ssl_DestroySocketContents | ssl_FreeSocket | ssl_DefClose | ssl_SecureClose | ssl_Close | nsNSSSocketInfo::CloseSocketAndDestroy More Reports Search
UUID	35b439be-9f47-4b25-bbb7-4dae12130107
Date Processed	2013-01-07 05:59:55
Uptime	1707
Last Crash	more than 3 months before submission
Install Age	1.3 days since version was first installed.
Install Time	2013-01-05 22:42:56
Product	FennecAndroid
Version	18.0
Build ID	20121231070142
Release Channel	beta
OS	Android
OS Version	0.0.0 Linux 2.6.36.3 #1 SMP PREEMPT Tue Dec 6 09:35:48 KST 2011 armv7l samsung/GT-P7500/GT-P7500:3.2/HTJ85B/BUKL1:user/release-keys
Build Architecture	arm
Build Architecture Info	
Crash Reason	SIGSEGV
Crash Address	0x32256eb9
App Notes 	
AdapterDescription: 'NVIDIA Corporation -- NVIDIA Tegra -- OpenGL ES 2.0 -- Model: GT-P7500, Product: GT-P7500, Manufacturer: samsung, Hardware: p3'
EGL? EGL+ GL Context? GL Context+ GL Layers? GL Layers+ WebGL? WebGL+ 
samsung GT-P7500
samsung/GT-P7500/GT-P7500:3.2/HTJ85B/BUKL1:user/release-keys
Processor Notes 	/data/socorro/stackwalk/bin/exploitable: ERROR: unable to analyze dump
EMCheckCompatibility	True
Adapter Vendor ID	NVIDIA Corporation
Adapter Device ID	NVIDIA Tegra
Device	samsung GT-P7500
Android API Version	13 (REL)
Android CPU ABI	armeabi-v7a

Frame 	Module 	Signature 	Source
0 	libnss3.so 	nssCertificate_Destroy 	certificate.c:102
1 	libnss3.so 	NSSCertificate_Destroy 	certificate.c:150
2 	libnss3.so 	CERT_DestroyCertificate 	stanpcertdb.c:795
3 	libssl3.so 	ssl3_CleanupPeerCerts 	ssl3con.c:8325
4 	libssl3.so 	ssl3_DestroySSL3Info 	ssl3con.c:10310
5 	libssl3.so 	ssl_DestroySocketContents 	sslsock.c:406
6 	libssl3.so 	ssl_FreeSocket 	sslsock.c:465
7 	libssl3.so 	ssl_DefClose 	ssldef.c:206
8 	libssl3.so 	ssl_SecureClose 	sslsecur.c:1063
9 	libssl3.so 	ssl_Close 	sslsock.c:2050
10 	libxul.so 	nsNSSSocketInfo::CloseSocketAndDestroy 	security/manager/ssl/src/nsNSSIOLayer.cpp:692
11 	libxul.so 	nsSSLIOLayerClose 	security/manager/ssl/src/nsNSSIOLayer.cpp:682
12 	libnspr4.so 	PR_Close 	priometh.c:104
13 	libxul.so 	nsSocketTransport::ReleaseFD_Locked 	netwerk/base/src/nsSocketTransport2.cpp:1415
14 	libxul.so 	nsSocketTransport::OnSocketDetached 	netwerk/base/src/nsSocketTransport2.cpp:1658
15 	libxul.so 	nsSocketTransportService::DetachSocket 	netwerk/base/src/nsSocketTransportService2.cpp:187
16 	libxul.so 	nsSocketTransportService::DoPollIteration 	netwerk/base/src/nsSocketTransportService2.cpp:817
17 	libxul.so 	nsSocketTransportService::Run 	netwerk/base/src/nsSocketTransportService2.cpp:645
18 	libxul.so 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:620
19 	libxul.so 	NS_ProcessNextEvent_P 	obj-firefox/xpcom/build/nsThreadUtils.cpp:220
20 	libxul.so 	nsThread::ThreadFunc 	xpcom/threads/nsThread.cpp:258
21 	libnspr4.so 	_pt_root 	ptthread.c:156
22 	libc.so 	__thread_entry 	
23 	libc.so 	pthread_create 	

More reports at:
https://crash-stats.mozilla.com/report/list?signature=nssCertificate_Destroy+|+NSSCertificate_Destroy+|+CERT_DestroyCertificate+|+ssl3_CleanupPeerCerts+|+ssl3_DestroySSL3Info+|+ssl_DestroySocketContents+|+ssl_FreeSocket+|+ssl_DefClose+|+ssl_SecureClose+|+ssl_Close+|+nsNSSSocketInfo%3A%3ACloseSocketAndDestroy
https://crash-stats.mozilla.com/report/list?signature=nssCertificateStore_Lock+|+nssCertificate_Destroy+|+NSSCertificate_Destroy+|+CERT_DestroyCertificate+|+ssl3_CleanupPeerCerts+|+ssl3_DestroySSL3Info+|+ssl_DestroySocketContents+|+ssl_FreeSocket+|+ssl_DefClose+|+ssl_SecureClose+|+ssl_Close+|+nsNSSSocketI...
https://crash-stats.mozilla.com/report/list?signature=PORT_FreeArena_Util+|+CERT_DestroyCertificate+|+ssl3_CleanupPeerCerts+|+ssl3_DestroySSL3Info+|+ssl_DestroySocketContents+|+ssl_FreeSocket+|+ssl_DefClose+|+ssl_SecureClose+|+ssl_Close+|+nsNSSSocketInfo%3A%3ACloseSocketAndDestroy
Crash Signature: ssl_FreeSocket | ssl_DefClose | ssl_SecureClose | ssl_Close | ns...] → ssl_FreeSocket | ssl_DefClose | ssl_SecureClose | ssl_Close | ns...] [@ nssCertificate_Destroy | NSSCertificate_Destroy | CERT_DestroyCertificate | ssl3_CleanupPeerCerts | ssl3_DestroySSL3Info | ssl_DestroySocketContents | ssl_FreeSocket | ssl_DefClose |…
Blocks: 761987
Upon first flash of the b2g phone I had opened up the e.me app, went to linkedin, hit home, switched to twitter, long tap home and closed e.me.  I had not signed in to the either web pages.

After letting it sleep, I came back to find that the phone crashed w/ this signature:
bp-f5bfa065-4842-4627-95b5-1996d2130312

I can't seem to replicate it.
OS: Android → All
Hardware: ARM → All
Whiteboard: [native-crash] → [native-crash][b2g-crash]
I saw this as well once a while back on the Peak - like nhirata I was operating in a e.me app but was not able to reproduce it again.
Attached file logcat of crash
for this crash:
https://crash-stats.mozilla.com/report/index/ff513543-5f51-4f4f-a6e5-48bfd2130604

I add this logcat. The crash seems to start around line 26836 where the last Marionette command is recorded.

I can replicate this crash consistently but it takes 10-30 loops of running this test to invoke it. but generally quite repeatable; this is the 3rd time I have crashed it today.
Keywords: reproducible
I've also identified the piece of code in the automated testcase that is causing it (if I remove it I can't replicate it any more).

Please contact me if you want a more detailed STR. You will need a B2G device and engineering build too and some specific instructions from me.
I was able to reproduce this once using this test:

https://github.com/mozilla/gaia-ui-tests/pull/960/files#L4R18
Crash Signature: nsNSSSocketInfo::CloseSocketAndDestroy(nsNSSShutDownPreventionLock const&)] → nsNSSSocketInfo::CloseSocketAndDestroy(nsNSSShutDownPreventionLock const&)] [@ PORT_FreeArena_Util | ssl3_CleanupPeerCerts | ssl3_DestroySSL3Info | ssl_DestroySocketContents | ssl_FreeSocket | ssl_DefClose | ssl_Close | nsNSSSocketInfo::CloseSocketAndDe…
I've just been able to replicate this for the first time today on b2g25 using a mozilla-central/master build of B2G.

The crash report is here:
https://crash-stats.mozilla.com/report/index/83a9edab-dff5-47b6-8105-7b4132130704

I'm not sure of the significance but I replicated it by running this test on a repeat loop until it failed:
https://github.com/mozilla/gaia-ui-tests/blob/master/gaiatest/tests/email/test_setup_basic_gmail.py

Perhaps the email/gmail app uses https and is triggering this bug.
(This crash has been observed during B2G v1.2 orangutan runs as well. Gecko SHA1: 85ead3614a3de104ca8b52c63a5b9b35c68feaa5)
(In reply to Michael Vines [:m1] [:evilmachines] from comment #8)
> (This crash has been observed during B2G v1.2 orangutan runs as well. Gecko
> SHA1: 85ead3614a3de104ca8b52c63a5b9b35c68feaa5)

This is going to be hard to block on - we've had this issue since 1.01 & general Firefox as well. Why this needed now instead of in 1.01 & 1.1?
(We've only seen this crash once now in ~500 hours of test on B2G v1.2, so yes I agree this bug need not block that release at the moment).
Just got this on v1.5

Enabled cell data then enabled wifi.

https://crash-stats.mozilla.com/report/index/4c7fe1c7-ddc7-4a55-a5f7-174732140417

Hamachi
Gaia      dadf0e60a6421f5b57ee9fc536c6617212805c19
Gecko     https://hg.mozilla.org/mozilla-central/rev/c55dfb01a027
BuildID   20140417040206
Version   31.0a1
ro.build.version.incremental=324
ro.build.date=Thu Dec 19 14:04:55 CST 2013
Crash encountered on Flame 1.4

Crash report here: https://crash-stats.mozilla.com/report/index/8cec7e3e-8b05-4861-bdbd-5eee12140515

Encountered this issue while attempting to load a site through the browser app and sliding down the notification tab.

1.4F Environmental Variables:
Device: Flame 1.4F
BuildID: 20140515000202
Gaia: 2e97bee6bb79d3577dba1bf2a1bbfcba64ee99ab
Gecko: 0cb91945f404
Version: 30.0
Firmware Version: v10F-3
Encountered this on Nightly Fx for Android - Galaxy Nexus with STR:
1. enter mozilla.org
2. tap settings -> Settings

https://crash-stats.mozilla.com/report/index/63a6fa70-3017-44f7-8906-3cbdc2140527
We hit this crash today on Flame 2.0 going through FTE (checking privacy links) on 2 devices.

Flame master m-c
BuildID: 20140527040202
Gaia: 6a391274cd436f8f0d1fad2db8c6b4805703259c
Gecko: cbe4f69c2e9c
Version: 32.0a1
v10G-2
This is the top crash more than double than the next nearest crasher. There is a significant regression in the crashes/100 ADI in Firefox 31 beta 1. This is a significant factor.
Assignee: nobody → cviecco
Crash Signature: nsNSSSocketInfo::CloseSocketAndDestroy(nsNSSShutDownPreventionLock const&)] → nsNSSSocketInfo::CloseSocketAndDestroy(nsNSSShutDownPreventionLock const&)] [@ nssCertificate_Destroy | CERT_DestroyCertList | mozilla::pkix::ScopedPtr<CERTCertListStr, &CERT_DestroyCertList>::~ScopedPtr()]
Initial analysis:

Inside CERT_DestroyCertificate(certs->cert) derefernce to c->decoded fails, after c is checked for not being null. This implies that while c is not null,  it is pointing to and arbitrary memory location.
This is called from ssl3_CleanupPeerCerts() which is iterating over ssl3.peerCertChain.

ssl3PeerCertChain is created in ssl3_HandleCertificate and the contents there are created by CERT_NewTempCertificate( http://mxr.mozilla.org/mozilla-central/source/security/nss/lib/ssl/ssl3con.c#9886here). 

there is a potential issue on line 399  where nssItem_Create as we dont check the return value of but that affects the value of c->decoded not c.
Another potential issue also on CERT_NewTempCertificate is that it checks if the cert might be added to the permanent database after it checks for this but before it returns. This should not affect the value returned tough.

Preparing patch to add more logging.
Flags: needinfo?(zcampbell)
Zac, Can you still reproduce it? I have  a flame device with me. STRS would be great.
I've not seen this bug for a long time (as I mostly work on b2g33). It was more of a monkey test that replicated it. If I had a precise STR I would have included it!
Flags: needinfo?(zcampbell)
See also: nsSocketTransportService uses NSS without implementing nsNSSShutDownObject contract, which I think is filed already but maybe not.
See Also: → 711786, 946159
Doug, can you find someone to take a look there please, see the comment #20 from bsmith.
Flags: needinfo?(dougt)
This adds a bunch of android logging to nss so that there is more information to solve this bug. This is not part of the official NSS so it needs to be removed from the source AND will go away on NSS updates.
Attachment #8447409 - Flags: review?(blassey.bugs)
try: -b do -p all -u xpcshell -t none
oops here is the actual pushlog

https://tbpl.mozilla.org/?tree=Try&rev=3c024e04cf65
Don't you want to push this to inbound/m-c?
Flags: needinfo?(dougt)
Note that we only see this in beta with enough frequency, which IMHO right now is is a state that I don't want to see us ship, and the time is running short of being able to still get things into that beta cycle. Also, didn't comment #20 already give us the info needed of what's to be done to fix this?

(Also, as a note, IMHO it's not good that this 31 regression was put onto an old bug, as those signatures probably will be seen for any bug in this group of not properly shutting down NSS. This is only one specific case of that happening.)
Robert, Comment 20 did not give us enough info for what is needed to fix this. It gave a suggestion, but without a reproducible case putting that potential fix would only will make the code more complex without any assurance of being fixed. The patch here is to allow us to have some logging to distinguish between the double free and not allocated case. Once we know that difference we can focus on fixing or papering over.

Doug. Yes I want to push to central, but I want a review from a mobile peer to ancknowlege the noise we will be placing into the android log.
Comment on attachment 8447409 [details] [diff] [review]
nss-loggingAndroid

Review of attachment 8447409 [details] [diff] [review]:
-----------------------------------------------------------------

this logging looks chatty. I'm fine with it going into nightly as long as we plan to pull it before uplift. If it goes beyond that it needs to be wrapped in an #ifdef RELEASE_BUILD
Attachment #8447409 - Flags: review?(blassey.bugs) → review+
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #28)
> this logging looks chatty. I'm fine with it going into nightly as long as we
> plan to pull it before uplift. If it goes beyond that it needs to be wrapped
> in an #ifdef RELEASE_BUILD

I'm not sure if it's helpful at all if we do not have it in Android Beta, as the crashes only happen there with any usable frequency.
This is not happening at all on Nightly (33a1) over the last 28 days. Though with a population of 1,000 users it is not clear if this is resolved there. For the investigation code to be useful it will need to be uplifted to Aurora (32a2) where we see this ~1 crash a day. If the less frequent signatures associated with this bug are unrelated we would need to uplift to Beta (31b#) to get enough volume to provide useful feedback.
This looks to be resolved by the slew of NSS (NSS 3.16 upgraded to NSS 3.16.2) or maybe PKIX changes taken in beta 5 or 6. http://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FENNEC_31_0b4_RELEASE&tochange=FENNEC_31_0b6_RELEASE

The if this holds true the diagnostic patch can be backed out of mozilla-central.
Coming to 5.5 hours after the latest push and I still dont know if the windows fixes are OK (build on all platforms and xpchell tests only)

https://tbpl.mozilla.org/?tree=Try&rev=2961f2cd90d5
https://hg.mozilla.org/mozilla-central/rev/580f5b55b410
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #28)
> Comment on attachment 8447409 [details] [diff] [review]
> nss-loggingAndroid
> 
> Review of attachment 8447409 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> this logging looks chatty. I'm fine with it going into nightly as long as we
> plan to pull it before uplift. If it goes beyond that it needs to be wrapped
> in an #ifdef RELEASE_BUILD

If this bug is fixed, we should remove the logging.
Comment on attachment 8447409 [details] [diff] [review]
nss-loggingAndroid

Camilo: this kind of patch needs to be documented in the
security/nss/patches directory, otherwise it may be
overwritten by a new NSS update.

If you don't need this patch any more, please revert it.
Thanks.
Camilo, are you still trying to find a fix for branches? If not, please back out this logging patch.
Flags: needinfo?(cviecco)
removed by changeset 3e672df388ed
Flags: needinfo?(cviecco)
Updating the tracking flags if I understood correctly the comment #31
These signatures are nowhere to be found in anything after Firefox/Fennec 31.0b4.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: