Closed Bug 803267 Opened 12 years ago Closed 12 years ago

crash in nsSocketTransportService::DetachSocket

Categories

(Core :: Networking, defect)

19 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox18 --- unaffected
firefox19 - affected

People

(Reporter: scoobidiver, Unassigned)

Details

(Keywords: crash, regression)

Crash Data

It spiked from 19.0a1/20121018. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=dac5700acf8b&tochange=cb573b9307e5
It might be a regression from bug 766817 (uplifted to Aurora and Beta).

Signature 	nsSocketTransportService::DetachSocket(nsSocketTransportService::SocketContext*, nsSocketTransportService::SocketContext*) More Reports Search
UUID	f923e4af-f426-4f88-9aeb-338dd2121018
Date Processed	2012-10-18 18:15:31
Uptime	729
Last Crash	2.1 days before submission
Install Age	50.8 minutes since version was first installed.
Install Time	2012-10-18 17:24:35
Product	Firefox
Version	19.0a1
Build ID	20121018030618
Release Channel	nightly
OS	Windows NT
OS Version	6.1.7601 Service Pack 1
Build Architecture	x86
Build Architecture Info	AuthenticAMD family 16 model 2 stepping 3
Crash Reason	EXCEPTION_ACCESS_VIOLATION_EXEC
Crash Address	0xc5e3d00
App Notes 	
AdapterVendorID: 0x10de, AdapterDeviceID: 0x0ca3, AdapterSubsysID: 072e10de, AdapterDriverVersion: 8.17.13.142
D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ 
EMCheckCompatibility	True
Adapter Vendor ID	0x10de
Adapter Device ID	0x0ca3
Total Virtual Memory	2147352576
Available Virtual Memory	1554419712
System Memory Use Percentage	41
Available Page File	4214034432
Available Physical Memory	1731235840

Frame 	Module 	Signature 	Source
0 		@0xc5e3d00 	
1 	xul.dll 	nsSocketTransportService::DetachSocket 	netwerk/base/src/nsSocketTransportService2.cpp:187
2 	xul.dll 	nsSocketTransportService::DoPollIteration 	netwerk/base/src/nsSocketTransportService2.cpp:817
3 	xul.dll 	nsSocketTransportService::Run 	netwerk/base/src/nsSocketTransportService2.cpp:645
4 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:612
5 	xul.dll 	nsThread::ThreadFunc 	xpcom/threads/nsThread.cpp:256
6 	nspr4.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:395
7 	nspr4.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:90
8 	msvcr100.dll 	_callthreadstartex 	f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c:314
9 	msvcr100.dll 	_threadstartex 	f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c:292
10 	kernel32.dll 	BaseThreadInitThunk 	
11 	ntdll.dll 	__RtlUserThreadStart 	
12 	ntdll.dll 	_RtlUserThreadStart

More reports at:
https://crash-stats.mozilla.com/report/list?signature=nsSocketTransportService%3A%3ADetachSocket%28nsSocketTransportService%3A%3ASocketContext*%2C+nsSocketTransportService%3A%3ASocketContext*%29
With combined signatures, it's #3 top crasher in today's build.
Crash Signature: [@ nsSocketTransportService::DetachSocket(nsSocketTransportService::SocketContext*, nsSocketTransportService::SocketContext*)] → [@ nsSocketTransportService::DetachSocket(nsSocketTransportService::SocketContext*, nsSocketTransportService::SocketContext*)] [@ nsSocketTransport::Release()] [@ @0x0 | nsSocketTransport::Release()] [@ nsSocketTransport::~nsSocketTransport()]
Keywords: topcrash
 bug 766817 landed on aurora on 10/18 - that is also built nightly, right? Crash reports doesn't seem to show any instances of this on 18 while it is prevalent on 19 - so 766817 seems like a less likely cause.

this crash has been on and off on many releases for a while, but something changed in 19 to drive it through the roof.
bug 802378 could be a shot in the dark here.
Brian, how would you feel about backing out 802378 for a couple days just to look at the crash stats? It doesn't look like a critical patch, but I wouldn't know... I don't have a better suggestion at the moment.

The most common form of crash is an exec error at nsSocketTransportService2.cpp:180
    sock->mHandler->OnSocketDetached(sock->mFD);

Probably a corrupt mHandler.

From the regression range here are the things that seem tangentially related:

* two patches of mine 766817 and 785050.. these also went to aurora (ff18) on 10/18 and the crash does not appear on aurora at all. I believe aurora gets pushed out pretty much as quickly as nightly (correct me if I'm wrong). Its not exactly an innocence proof as there could be some other ff19 interaction, but its reasonable doubt :).. these are also corner case code paths that are hard to square with a top crasher (one involves using windows integrated auth and the other websockets that recevied http auth reply of 401 or 407).

* a few rtcweb things.. but that's preffed off and I just have a hard time believing that there is enough use of it to generate that many crashes. But maybe there is a early adopter cult following - I have seen weird bugs generate high ranking crashes before (e.g. I made compilation error in your PAC file crash nightly and that immediately was a top 10 crasher to my surprise) which is a tribute to the general stableness of nightly I suppose.

* 802378 .. my primary suspicion is really just guilt by association.. so its not a strong suspicion. but the patch changes a whole bunch of stuff from nsRefPtr to mozilla::RefPtr. I'm not really clear on what the differences are there (thread safety issues maybe?) but a bad reference count could certainly be at the root of the issue. In its defense it doesn't directly handle the classes in question, but neither does anything in the regression range. and psm certainly is in the mix as an nspr layer.
Flags: needinfo?(bsmith)
with brian's go ahead I backed out 802378 as a candidate for causing this.

If that doesn't work we'll test 766817 and 785050.

 https://hg.mozilla.org/integration/mozilla-inbound/rev/82d78390fda1
Whiteboard: [leave open]
There are no crashes in 19.0a1/20121023. The working range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1c3e4cb1f754&tochange=48502b61a63e
The backout of bug 802378 landed after this window, so bug 802378 should be pushed again.
Flags: needinfo?(bsmith)
Keywords: topcrash
(In reply to Scoobidiver from comment #7)
> There are no crashes in 19.0a1/20121023. The working range is:
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=1c3e4cb1f754&tochange=48502b61a63e

weird - but still true now for nightly channel. I've looked at that range and I can't explain it.. some far out possibilities:

* bluetooth changes... I don't think they use the STS poll mechanism, but maybe I'm wrong

* a fix to the necko shutdown logic - that could logically be quire related to the crash signature but that problem has existed for a long time and these crashes are new.. perhaps it fixes an underlying problem for which a trigger was only recently added.

let's close this and reopen if they appear again.


> The backout of bug 802378 landed after this window, so bug 802378 should be
> pushed again.

done.
 https://hg.mozilla.org/integration/mozilla-inbound/rev/9a4531d2d243
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Whiteboard: [leave open]
You need to log in before you can comment on or make changes to this bug.