Last Comment Bug 753663 - crash in mozilla::net::SpdyStream::UpdateTransportReadEvents
: crash in mozilla::net::SpdyStream::UpdateTransportReadEvents
Status: RESOLVED FIXED
[spdy]
: crash, regression
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: 13 Branch
: All All
: -- critical (vote)
: ---
Assigned To: Patrick McManus [:mcmanus]
:
Mentors:
: 756380 (view as bug list)
Depends on: 755467 756551
Blocks: 739522
  Show dependency treegraph
 
Reported: 2012-05-10 00:02 PDT by Scoobidiver (away)
Modified: 2012-08-13 04:31 PDT (History)
6 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
wontfix
verified
verified


Attachments

Description Scoobidiver (away) 2012-05-10 00:02:51 PDT
It first appeared in 15.0a1/20120507. The regression range might be:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=94ce5f33a9ea&tochange=448f554f6acb

Signature 	mozilla::net::SpdyStream::UpdateTransportReadEvents(unsigned int) More Reports Search
UUID	698c8deb-7e5e-4ced-9f6a-6e06f2120509
Date Processed	2012-05-09 20:07:37
Uptime	9248
Last Crash	1.2 weeks before submission
Install Age	4.2 hours since version was first installed.
Install Time	2012-05-09 15:54:21
Product	Firefox
Version	15.0a1
Build ID	20120509030514
Release Channel	nightly
OS	Windows NT
OS Version	6.1.7601 Service Pack 1
Build Architecture	x86
Build Architecture Info	GenuineIntel family 6 model 37 stepping 5
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0xa4
App Notes 	
AdapterVendorID: 0x10de, AdapterDeviceID: 0x1086, AdapterSubsysID: 15793842, AdapterDriverVersion: 8.17.13.124
D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ 
EMCheckCompatibility	True	
Total Virtual Memory	4294836224
Available Virtual Memory	3511443456
System Memory Use Percentage	26
Available Page File	14222860288
Available Physical Memory	6223749120

Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::net::SpdyStream::UpdateTransportReadEvents 	netwerk/protocol/http/SpdyStream.cpp:497
1 	xul.dll 	mozilla::net::SpdySession::HandleSynReplyForValidStream 	netwerk/protocol/http/SpdySession.cpp:1075
2 	xul.dll 	mozilla::net::SpdySession::HandleSynReply 	netwerk/protocol/http/SpdySession.cpp:1007
3 	xul.dll 	mozilla::net::SpdySession::WriteSegments 	netwerk/protocol/http/SpdySession.cpp:1725
4 	xul.dll 	nsHttpConnection::OnSocketReadable 	netwerk/protocol/http/nsHttpConnection.cpp:1386
5 	xul.dll 	nsHttpConnection::OnInputStreamReady 	netwerk/protocol/http/nsHttpConnection.cpp:1506
6 	xul.dll 	nsSocketInputStream::OnSocketReady 	netwerk/base/src/nsSocketTransport2.cpp:256
7 	xul.dll 	nsSocketTransport::OnSocketReady 	netwerk/base/src/nsSocketTransport2.cpp:1575
8 	xul.dll 	nsSocketTransportService::DoPollIteration 	netwerk/base/src/nsSocketTransportService2.cpp:762
9 	xul.dll 	nsSocketTransportService::Run 	netwerk/base/src/nsSocketTransportService2.cpp:645
10 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:656
11 	xul.dll 	nsThread::ThreadFunc 	xpcom/threads/nsThread.cpp:289
12 	nspr4.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:426
13 	nspr4.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:122
14 	msvcr100.dll 	_callthreadstartex 	f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c:314
15 	msvcr100.dll 	_threadstartex 	f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c:292
16 	kernel32.dll 	BaseThreadInitThunk 	
17 	ntdll.dll 	__RtlUserThreadStart 	
18 	ntdll.dll 	_RtlUserThreadStart

More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Anet%3A%3ASpdyStream%3A%3AUpdateTransportReadEvents%28unsigned+int%29
Comment 1 Patrick McManus [:mcmanus] 2012-05-10 07:36:29 PDT
the first build this happens with is 050703 - which is built from
http://hg.mozilla.org/mozilla-central/rev/448f554f6acb

that means it is very likely a regression from

summary:     bug 726366 - support tunneling spdy through proxies
or
summary:     bug 751647 - speculative connect needs to check spdy ip pooling table 

seems most likely to be 726366
Comment 2 Patrick McManus [:mcmanus] 2012-05-10 07:46:13 PDT
disregard comment 1. Neither of those commits were in the first instance of this crash report. I just read the logs wrong.
Comment 3 Patrick McManus [:mcmanus] 2012-05-10 08:33:54 PDT
this is weird. hozna if you have thoughts they would be appreciated.

The stack is always death on mTransaction->OnTransportStatus() in

0 	xul.dll 	mozilla::net::SpdyStream::UpdateTransportReadEvents 	netwerk/protocol/http/SpdyStream.cpp:497
1 	xul.dll 	mozilla::net::SpdySession::HandleSynReplyForValidStream 	netwerk/protocol/http/SpdySession.cpp:1075
2 	xul.dll 	mozilla::net::SpdySession::HandleSynReply 	netwerk/protocol/http/SpdySession.cpp:1007


SpdyStream::mTransaction is a hard reference held for the life of the object. It should be valid. Because this crash is on the receipt of spdy-syn-reply the same transaction has already been used for creating the request so its very unlikely that it was an invalid ptr from the start. Further, this means it has done mTransaction->OnTransportStatus() from UpdateTransprortSendEvents previously without crashing.

SpdyStream itself is probably valid because it survived the line above in UpdateTransportReadEvents as well as the get/set fully open logic.
Comment 4 Patrick McManus [:mcmanus] 2012-05-12 06:49:00 PDT
crud, this has spread to ff14 and ff13.. that means it is almost certainly 739522 (the only spdy thing that has been ported to ff13).

its a low volume crash, and the fix was for an also uncommmon but serious issue. (how common is hard to know as it won't generate crash data).

understanding the crash would make it easier to decide what to do.
Comment 7 Patrick McManus [:mcmanus] 2012-05-16 08:49:33 PDT
172 total data points..

uptime  #-datapoints
-----   ------------
0             1
1-113         0
114-180       5
181-999     103
>999         63


That gap below 180 makes me think this is a session shutdown issue of some kind.. its hard to know when that will happen but most existing spdy services try and give you at least ~120 seconds and our idle timer is 180.

obviously the clock doesn't really start at uptime=0, it starts when you make a connection.. and its an idle clock not an elapsed time clock, but I still find the correlation consistent - I mean 2/3 of the data points are less than 1000 seconds of uptime but essentially none of them are before 114 seconds.

That could bode well for the patch in bug 755467 which basically reverts the recent changes to RestrictConnection() only in the case where there is a spdy session closing down (or not reusable in some way). That patch is in nightly (ff15) 0516-03 .. we'll see if it helps the crash stats.
Comment 8 Patrick McManus [:mcmanus] 2012-05-17 06:37:02 PDT
(In reply to Patrick McManus [:mcmanus] from comment #7)
>
> 
> That could bode well for the patch in bug 755467 which basically reverts the
> recent changes to RestrictConnection() only in the case where there is a
> spdy session closing down (or not reusable in some way). That patch is in
> nightly (ff15) 0516-03 .. we'll see if it helps the crash stats.

crud - negative result.

https://crash-stats.mozilla.com/report/index/0faa6270-564b-4ba7-99c2-3e92e2120517

is a crash of this signature with 051603
Comment 9 Scoobidiver (away) 2012-05-18 02:12:53 PDT
*** Bug 756380 has been marked as a duplicate of this bug. ***
Comment 10 Patrick McManus [:mcmanus] 2012-05-18 09:33:57 PDT
It seems likely that the SpdyStream being used is invalid. Perhaps deleted? or deleted twice?

This is a scenario that would match the data:

1 :: new stream 1 = ptr foo
2 :: delete {1, foo}
3 :: new stream 3 = ptr foo [foo could be reused by the allocator]
4 :: delete {1, foo} [bug!]
5 :: lookup stream 3.. get foo, deref foo and see crash

The delete of {1, foo} is really two steps.. mStreamIDHash.Remove(1) is a nop because 1 isn't in the ID hash anymore, and mTransactionHash.Remove(foo), which succeeds by accidentally removing (and dtoring) stream 3. The subsequent lookup for stream 3 gets a ptr that has been destroyed..
Comment 11 Honza Bambas (:mayhemer) 2012-05-18 09:51:19 PDT
Good theory.  I have to confirm that Windows system very much likes to reuse memory for a class object allocation that has just been freed for an object of that same class.  It might be specific to jemalloc or to Win directly. (I was experiencing a bug that was hidden on my local machine because of a different mem allocation strategy of debug builds).

Based on the last sentence: did you try to reproduce merely on Windows with a jemalloc optimized build?  Best with nightly build.
Comment 12 Patrick McManus [:mcmanus] 2012-05-18 10:01:33 PDT
(In reply to Honza Bambas (:mayhemer) from comment #11)
> 
> Based on the last sentence: did you try to reproduce merely on Windows with
> a jemalloc optimized build?  Best with nightly build.

I have tried windows nightlies and haven't repro'd. But the crash isn't all that common compared to the number of users so without any kind of STR I'm not surprised.

it is interesting that there are ~260 crashes and they are 100% windows. Obviously a strong windows skew is expected, but at that point you'd expect OS X to make an appearance or two.

fwiw (and its not much) there haven't been any nightly crashes yet from the may-17 build which had those other 3 fixes in it.

I'm going to do a patch that basically looks for index inconsistencies when doing deletes or looking up by stream id. That might at least keep us on the rails while we work out the underlying issue.
Comment 13 Scoobidiver (away) 2012-05-18 10:15:01 PDT
(In reply to Patrick McManus [:mcmanus] from comment #12)
> but at that point you'd expect OS X to make an appearance or two.
OS X is proud to crash like Windows: https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Anet%3A%3ASpdyStream%3A%3AUpdateTransportReadEvents
Comment 14 Patrick McManus [:mcmanus] 2012-05-18 10:23:53 PDT
(In reply to Scoobidiver from comment #13)
> (In reply to Patrick McManus [:mcmanus] from comment #12)
> > but at that point you'd expect OS X to make an appearance or two.
> OS X is proud to crash like Windows:

oddly comforting!
Comment 15 Patrick McManus [:mcmanus] 2012-05-24 06:54:28 PDT
a slighty less common variant makes it pretty clear that stream is bogus, probably already freed.

0 	xul.dll 	mozilla::net::SpdyStream::Close 	netwerk/protocol/http/SpdyStream.cpp:789
1 	xul.dll 	mozilla::net::SpdySession::CleanupStream 	netwerk/protocol/http/SpdySession.cpp:909
2 	xul.dll 	mozilla::net::SpdySession::HandleSynReply 	netwerk/protocol/http/SpdySession.cpp:1011
Comment 16 Patrick McManus [:mcmanus] 2012-05-27 10:24:08 PDT
bug 756551 has resolved the crash. I talked with alex via email and he indicated that patch wouldn't be accepted for ff13 at this time due to the late stage of that release.

Note You need to log in before you can comment on or make changes to this bug.