Closed Bug 706236 Opened 13 years ago Closed 13 years ago

SPDY Deadlock/SIGBUS

Categories

(Core :: Networking, defect)

x86_64
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla11

People

(Reporter: posidron, Assigned: mcmanus)

References

(Blocks 1 open bug)

Details

(Keywords: crash, testcase, Whiteboard: [sg:dos])

Attachments

(4 files)

Attached file callstack
This happens usually every time when sending single RST_STREAM frames as a response back to Firefox after SYN_STREAM requests. Please look at Thread 4 in the attached callstack.

RST_FRAME: 80 02 00 03 00 00 00 08 00 00 00 00 01

I will try to attach a separate testcase as soon as I am successful in reproducing the bug outside of Peach.

Marked as a security bug because I am currently not sure in how far a deadlock condition can be exploited in a Browser environment.
Attached file NSPR_LOG/SpdySession
Attached file testcase
Attaching a small testcase which works for me.

The testcase requires a patched TLSLite with NPN support:
http://xophdware.com/mozilla/3rdParty/tlslite-npn.tar.gz
Keywords: testcase
call it [sg:dos] unless someone can think of something more clever to do with a deadlock.
Whiteboard: [sg:dos]
(In reply to Christoph Diehl [:cdiehl] from comment #0)
> Created attachment 577729 [details]
> callstack
> 
> This happens usually every time when sending single RST_STREAM frames as a
> response back to Firefox after SYN_STREAM requests. Please look at Thread 4
> in the attached callstack.
> 
> RST_FRAME: 80 02 00 03 00 00 00 08 00 00 00 00 01
> 

just to be clear, you're testing something different than a rst_stream frame.

This is an incomplete frame. It has a length of 8 + 8 for the frame header - that's 16 and there are only 13 bytes here. It is being sent against stream 0 (which never exists) and the error code is incomplete, but it has a bit set in its MSB so it is going to be way out of range for a defined error code.

And then you're sending a close.

All of which is fair game for a fuzz test, of course, just wanted to be clear on what was being tested.

So far the test is WFM - it throws an error and life goes on without a deadlock. I'll look harder at your log to try and see what you saw, but its probably a timing issue related to a unexpected disconnect.

can you clear the security flag?
The forth byte is 03h, the spec says 03h is defined to be the type a RST_STREAM frame.
The NSPR logfile says: Control Frame Identified type 3 version 2

https://sites.google.com/a/chromium.org/dev/spdy/spdy-protocol/spdy-protocol-draft2#TOC-RST_STREAM
(In reply to Christoph Diehl [:cdiehl] from comment #5)
> The forth byte is 03h, the spec says 03h is defined to be the type a
> RST_STREAM frame.

Yes it is the right opcode. But the remainder of the frame is not valid for that opcode which is why I say this is not a valid RST_STREAM frame. the log message indicates that it is trying to parse one because the frame header (the first 8 bytes) are legit.

The length field is 8 (which it always must be for RST_STREAM, so that is correct). The interpretation of that is length + an implicit 8 for the control frame header for a total of 16. You send only 13 bytes and an EOF, so this isn't a complete frame.

Further the stream ID field is set to 0 (four bytes of 0 after the length field) - so it isn't addressing any valid stream. And the reason code field is the final 4 bytes, of which only 1 is present before the EOF. No valid reason code has a 1 in the MSB.

again, its a fuzz test so I have no problem with any of that. I'm just trying to put in the comment log for this bug that this isn't testing what happens when a during rst processing - it is testing a hangup during control frame parsing.
The log shows this:

183005184[10037b5c0]: SpdySession::LogIO 10fc64c00 stream=0 id=0x0 [Reading Control Frame]
183005184[10037b5c0]: 00000000: 00 00 00 00 01 
183005184[10037b5c0]: nsSocketInputStream::Read [this=19928990 count=3]
183005184[10037b5c0]:   calling PR_Read [count=3]
183005184[10037b5c0]:   PR_Read returned [n=-1]
183005184[10037b5c0]: ErrorAccordingToNSPR [in=-5961 out=804b0014]

804b0014 is NS_ERROR_NET_RESET.. 

it probably means your server generated a TCP RST instead of (or in addition to) the TCP FIN. It will do that if you close a socket with data unconsumed - in this case that's the request for "/" which the server never reads.

That kinda explains why this is WFM - timing associated with RST is pretty variable (if I process the close before the RST arrives, I won't see it). A tcpdump shows that the rst is generated for me but I bet I've already cleaned up everything by then.. hmm.
Group: core-security
Thank you for the additional information!
Yes, it may work on the first request but if you repeat it a few more times Firefox crashes constantly here.
Attached patch patch 0Splinter Review
Christoph, I was able to sorta-reproduce by hardcoding a few things to match what you were seeing in your log.

Can you test the attached patch before I ask for a code review of it, to make sure it really addresses the issue you identified?
Assignee: nobody → mcmanus
Attachment #578291 - Flags: feedback?(cdiehl)
Looks good, doesn't crash anymore. :)
Attachment #578291 - Flags: feedback?(cdiehl) → review?(honzab.moz)
Comment on attachment 578291 [details] [diff] [review]
patch 0

Review of attachment 578291 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab

::: netwerk/protocol/http/SpdySession.cpp
@@ +1507,5 @@
>                                mFrameDataSize - mFrameDataRead,
>                                countWritten);
>    if (NS_FAILED(rv)) {
> +    LOG3(("SpdySession %p buffering control frame read failure %x\n",
> +          this, rv));

BTW: adding a single \n at the end of the message log has no affect.
Attachment #578291 - Flags: review?(honzab.moz) → review+
Target Milestone: --- → mozilla11
https://hg.mozilla.org/mozilla-central/rev/3204b70435fe
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Backed out (along with the rest of the SPDY landing) in order to stop us hitting the MSVC virtual address limit, so we can reopen the trees (bug 709193).

Sucks, but we don't really have any other choice here :-(

https://hg.mozilla.org/integration/mozilla-inbound/rev/dc48c0992358
Status: RESOLVED → REOPENED
Depends on: 709193
Resolution: FIXED → ---
Target Milestone: mozilla11 → ---
Relanded on mozilla-central :-)

https://hg.mozilla.org/mozilla-central/rev/cf0b31ff2b6d
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla11
No longer depends on: 709193
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: