Last Comment Bug 706236 - SPDY Deadlock/SIGBUS
: SPDY Deadlock/SIGBUS
Status: RESOLVED FIXED
[sg:dos]
: crash, testcase
Product: Core
Classification: Components
Component: Networking (show other bugs)
: Trunk
: x86_64 Mac OS X
: -- critical (vote)
: mozilla11
Assigned To: Patrick McManus [:mcmanus] PTO until Sep 6
:
Mentors:
: 707668 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-29 13:02 PST by Christoph Diehl [:posidron]
Modified: 2012-02-01 13:57 PST (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
callstack (144.89 KB, text/plain)
2011-11-29 13:02 PST, Christoph Diehl [:posidron]
no flags Details
NSPR_LOG/SpdySession (150.56 KB, text/plain)
2011-11-30 08:01 PST, Christoph Diehl [:posidron]
no flags Details
testcase (3.21 KB, application/zip)
2011-11-30 08:59 PST, Christoph Diehl [:posidron]
no flags Details
patch 0 (3.93 KB, patch)
2011-12-01 09:51 PST, Patrick McManus [:mcmanus] PTO until Sep 6
honzab.moz: review+
Details | Diff | Splinter Review

Description Christoph Diehl [:posidron] 2011-11-29 13:02:34 PST
Created attachment 577729 [details]
callstack

This happens usually every time when sending single RST_STREAM frames as a response back to Firefox after SYN_STREAM requests. Please look at Thread 4 in the attached callstack.

RST_FRAME: 80 02 00 03 00 00 00 08 00 00 00 00 01

I will try to attach a separate testcase as soon as I am successful in reproducing the bug outside of Peach.

Marked as a security bug because I am currently not sure in how far a deadlock condition can be exploited in a Browser environment.
Comment 1 Christoph Diehl [:posidron] 2011-11-30 08:01:07 PST
Created attachment 577963 [details]
NSPR_LOG/SpdySession
Comment 2 Christoph Diehl [:posidron] 2011-11-30 08:59:18 PST
Created attachment 577986 [details]
testcase

Attaching a small testcase which works for me.

The testcase requires a patched TLSLite with NPN support:
http://xophdware.com/mozilla/3rdParty/tlslite-npn.tar.gz
Comment 3 Daniel Veditz [:dveditz] 2011-11-30 16:30:09 PST
call it [sg:dos] unless someone can think of something more clever to do with a deadlock.
Comment 4 Patrick McManus [:mcmanus] PTO until Sep 6 2011-12-01 07:17:08 PST
(In reply to Christoph Diehl [:cdiehl] from comment #0)
> Created attachment 577729 [details]
> callstack
> 
> This happens usually every time when sending single RST_STREAM frames as a
> response back to Firefox after SYN_STREAM requests. Please look at Thread 4
> in the attached callstack.
> 
> RST_FRAME: 80 02 00 03 00 00 00 08 00 00 00 00 01
> 

just to be clear, you're testing something different than a rst_stream frame.

This is an incomplete frame. It has a length of 8 + 8 for the frame header - that's 16 and there are only 13 bytes here. It is being sent against stream 0 (which never exists) and the error code is incomplete, but it has a bit set in its MSB so it is going to be way out of range for a defined error code.

And then you're sending a close.

All of which is fair game for a fuzz test, of course, just wanted to be clear on what was being tested.

So far the test is WFM - it throws an error and life goes on without a deadlock. I'll look harder at your log to try and see what you saw, but its probably a timing issue related to a unexpected disconnect.

can you clear the security flag?
Comment 5 Christoph Diehl [:posidron] 2011-12-01 07:37:04 PST
The forth byte is 03h, the spec says 03h is defined to be the type a RST_STREAM frame.
The NSPR logfile says: Control Frame Identified type 3 version 2

https://sites.google.com/a/chromium.org/dev/spdy/spdy-protocol/spdy-protocol-draft2#TOC-RST_STREAM
Comment 6 Patrick McManus [:mcmanus] PTO until Sep 6 2011-12-01 07:49:02 PST
(In reply to Christoph Diehl [:cdiehl] from comment #5)
> The forth byte is 03h, the spec says 03h is defined to be the type a
> RST_STREAM frame.

Yes it is the right opcode. But the remainder of the frame is not valid for that opcode which is why I say this is not a valid RST_STREAM frame. the log message indicates that it is trying to parse one because the frame header (the first 8 bytes) are legit.

The length field is 8 (which it always must be for RST_STREAM, so that is correct). The interpretation of that is length + an implicit 8 for the control frame header for a total of 16. You send only 13 bytes and an EOF, so this isn't a complete frame.

Further the stream ID field is set to 0 (four bytes of 0 after the length field) - so it isn't addressing any valid stream. And the reason code field is the final 4 bytes, of which only 1 is present before the EOF. No valid reason code has a 1 in the MSB.

again, its a fuzz test so I have no problem with any of that. I'm just trying to put in the comment log for this bug that this isn't testing what happens when a during rst processing - it is testing a hangup during control frame parsing.
Comment 7 Patrick McManus [:mcmanus] PTO until Sep 6 2011-12-01 08:16:46 PST
The log shows this:

183005184[10037b5c0]: SpdySession::LogIO 10fc64c00 stream=0 id=0x0 [Reading Control Frame]
183005184[10037b5c0]: 00000000: 00 00 00 00 01 
183005184[10037b5c0]: nsSocketInputStream::Read [this=19928990 count=3]
183005184[10037b5c0]:   calling PR_Read [count=3]
183005184[10037b5c0]:   PR_Read returned [n=-1]
183005184[10037b5c0]: ErrorAccordingToNSPR [in=-5961 out=804b0014]

804b0014 is NS_ERROR_NET_RESET.. 

it probably means your server generated a TCP RST instead of (or in addition to) the TCP FIN. It will do that if you close a socket with data unconsumed - in this case that's the request for "/" which the server never reads.

That kinda explains why this is WFM - timing associated with RST is pretty variable (if I process the close before the RST arrives, I won't see it). A tcpdump shows that the rst is generated for me but I bet I've already cleaned up everything by then.. hmm.
Comment 8 Christoph Diehl [:posidron] 2011-12-01 08:32:23 PST
Thank you for the additional information!
Yes, it may work on the first request but if you repeat it a few more times Firefox crashes constantly here.
Comment 9 Patrick McManus [:mcmanus] PTO until Sep 6 2011-12-01 09:51:14 PST
Created attachment 578291 [details] [diff] [review]
patch 0

Christoph, I was able to sorta-reproduce by hardcoding a few things to match what you were seeing in your log.

Can you test the attached patch before I ask for a code review of it, to make sure it really addresses the issue you identified?
Comment 10 Christoph Diehl [:posidron] 2011-12-01 12:17:08 PST
Looks good, doesn't crash anymore. :)
Comment 11 Patrick McManus [:mcmanus] PTO until Sep 6 2011-12-05 07:21:39 PST
*** Bug 707668 has been marked as a duplicate of this bug. ***
Comment 12 Honza Bambas (:mayhemer) 2011-12-06 03:07:01 PST
Comment on attachment 578291 [details] [diff] [review]
patch 0

Review of attachment 578291 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab

::: netwerk/protocol/http/SpdySession.cpp
@@ +1507,5 @@
>                                mFrameDataSize - mFrameDataRead,
>                                countWritten);
>    if (NS_FAILED(rv)) {
> +    LOG3(("SpdySession %p buffering control frame read failure %x\n",
> +          this, rv));

BTW: adding a single \n at the end of the message log has no affect.
Comment 13 Patrick McManus [:mcmanus] PTO until Sep 6 2011-12-06 10:26:43 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/3204b70435fe
Comment 14 Ed Morley [:emorley] 2011-12-07 02:48:06 PST
https://hg.mozilla.org/mozilla-central/rev/3204b70435fe
Comment 15 Ed Morley [:emorley] 2011-12-10 15:34:29 PST
Backed out (along with the rest of the SPDY landing) in order to stop us hitting the MSVC virtual address limit, so we can reopen the trees (bug 709193).

Sucks, but we don't really have any other choice here :-(

https://hg.mozilla.org/integration/mozilla-inbound/rev/dc48c0992358
Comment 16 Ed Morley [:emorley] 2011-12-14 11:14:31 PST
Relanded on mozilla-central :-)

https://hg.mozilla.org/mozilla-central/rev/cf0b31ff2b6d

Note You need to log in before you can comment on or make changes to this bug.