Closed Bug 459524 Opened 16 years ago Closed 16 years ago

SOCKS5 reply with DNS name longer than 15 characters corrupts data


(Core :: Networking, defect)

1.9.0 Branch
Not set



Tracking Status
blocking1.9.1 --- .2+
status1.9.1 --- .2-fixed


(Reporter: andrej.andolsek, Assigned: nelson)


(Keywords: fixed1.9.0.12, Whiteboard: [sg:low])


(1 file, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/2008092417 Firefox/3.0.3
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/2008092417 Firefox/3.0.3

If Firefox receives SOCKS5 reply with DNS name longer than 15 characters then the subsequent data transfer will be corrupted. This will result with either pages being corrupted at the beginning, or with a Save As dialog being opened.

This quite possibly is a buffer overrun bug! And quite possibly it's an exploitable one too (that's why I'm classifying it as critical bug). It looks like Firefox can handle SOCKS5 replies up to 22 bytes in size - 22 bytes is needed for SOCKS5 reply holding IPv6 address. SOCKS5 reply with 15 character DNS name takes 22 bytes as well. Btw the maximum SOCKS5 reply is 262 bytes.

Reproducible: Always

Steps to Reproduce:
I've checked the code and it it's not a buffer overrun bug. In netwerk\socket\base\nsSOCKSIOLayer.cpp - function ConnectSOCKS5():

     case 0x03: // FQDN (should not get this back)
        default: // unknown format
            // if we get here, we don't know our external address.
            // however, as that's possibly not critical to the user,
            // we let it slide.
            PR_InitializeNetAddr(PR_IpAddrNull, 0, extAddr);
            //return NS_ERROR_FAILURE;

0x03 - DOMAINNAME can indeed be sent back, but it is highly unusual. 

Let me explain how this can happen. Tunnelier is an SSH client that uses SOCKS proxy forwarding. Tunnelier acts as SOCKS proxy server, but connections to actual targets are made by SSH server. So when generating SOCKS5 reply Tunnelier must use the address of SSH server. Now comes the tricky part. If Tunnelier itself is connected through (regular) proxy server, and Tunnelier is configured not to resolve DNS name locally, then Tunnelier doesn't know the IP address of SSH server. Only in this case will Tunnelier use DNS name in SOCKS5 reply.

I propose the following code changes:
1) case 0x03 should simply skip bytes remaining - response_len = 4 + 1 + response[4] + 2
2) default should yield failure, because you don't know how much bytes there is remaining in SOCKS5 reply.

Also I got a couple of more questions:
1) How come the code is happily reading response bytes without checking how much bytes it's been actually read? (What if PR_Recv() read only 1 byte?)
2) What if 10 byte SOCKS5 reply is returned (IPv4) and PR_Recv() reads whole 22 bytes? Well OK, HTTP server waits on HTTP request before sending anything, so that's why this works - but it doesn't look clean.
Severity: critical → minor
I agree there's no buffer overrun here.  
If the socks response is larger that the largest size this code expects,
the excess response is left unread in the kernel's socket receive buffers,
where it will appear as a prefix to the next set of expected received data
(e.g. to the subsequent http response).  

There could be an underrun.  A "short read" could fail to read the entire
response into the reader's buffer before trying to parse it.  The actual 
amount of data read in is ignored, other than to assure it's not -1.
The code should deal with "short reads" properly.

Remember, http isn't the only protocol used over socks, and some protocols
are server-speaks-first, so reading in more data into the response buffer
than are actually contained in the socks response could cause some of 
the initial data from the desired remote server to be read into the socks
response buffer and then discarded.  

The right way to do this (IMO) is to initially read in 4 bytes, the header of 
the socks response, then interpret the 4th byte to see the type of response
and its proper length.  If that 4th byte is 03, then read in the 5th byte
for the length.  Finally read in the rest of the response according to the
length determined.  Deal with "short reads" at every step.  This ensures
that no underruns occur, and we don't read in any of the target server's
data by assuming it to be part of the socks response.
Attached patch proposed patch (untested) v1 (obsolete) — Splinter Review
Reed, you've worked on this code before.  Please review
Assignee: nobody → nelson
Ever confirmed: true
Attachment #342755 - Flags: review?
Reed, please review the patch I attached to this bug.
I think you're probably wanting biesi.
Assignee: nelson → nobody
Group: mozilla-confidential
Component: General → Networking
Product: Firefox → Core
QA Contact: general → networking
Group: mozilla-confidential
I apparently fail at moving bugs. Sorry about the mess.
Assignee: nobody → nelson
OS: Windows XP → All
Hardware: PC → All
Version: unspecified → 1.9.0 Branch
Attachment #342755 - Flags: superreview?(cbiesinger)
Attachment #342755 - Flags: review?(cbiesinger)
Attachment #342755 - Flags: review?
hm, bug 353040 seems to have a pretty similar patch...
Whiteboard: [sg:low]
(In reply to comment #7)
> hm, bug 353040 seems to have a pretty similar patch...

The patches are basically the same. There is a difference in timeout handling.

             PR_InitializeNetAddr(PR_IpAddrNull, 0, extAddr);
-            //return NS_ERROR_FAILURE;
+            extPort = response[response_len - 2] << 8) | 
+	              response[response_len - 1];

This is unnecessary since extPort isn't used anywhere.
The patches are similar, but don't address the same set of issues exactly.
Michal's also attempts to address correcting timeout computations, which
seems good.  Mine also attempts to address a couple issues in parsing the 
SOCKS5 response while not increasing the number of PR_Recv calls more than

Michal spotted a bug in my patch.  extPort SHOULD be used.  It should be 
computed before the call to PR_InitializeNetAddr, and the value should be
used in that call.  e.g. 

-            PR_InitializeNetAddr(PR_IpAddrNull, 0, extAddr);
-            //return NS_ERROR_FAILURE;
+            extPort = response[response_len - 2] << 8) | 
+                  response[response_len - 1];
+            PR_InitializeNetAddr(PR_IpAddrNull, extPort, extAddr);

This should be done for the FQDN case, only, as my patch does.
Comment on attachment 342755 [details] [diff] [review]
proposed patch (untested) v1

Let's not let this sg bug sit. Is this attachment ready for review? We can find someone else to look at it if need be.
Comment on attachment 342755 [details] [diff] [review]
proposed patch (untested) v1

r+sr=biesi, sorry for the delay. do make the change from comment 9 before checking this in.
Attachment #342755 - Flags: superreview?(cbiesinger)
Attachment #342755 - Flags: superreview+
Attachment #342755 - Flags: review?(cbiesinger)
Attachment #342755 - Flags: review+
Attached patch patch v2 (still untested) (obsolete) — Splinter Review
This patch is v1 plus the changes I suggested previously. 

Do we have any one who can test this with a real socks server to ensure
no regressions?  My employer has stopped using socks servers and I no 
longer have any with which to test.
(In reply to comment #12)

I'll try to test it.
(In reply to comment #12)
> Created an attachment (id=364261) [details]
> patch v2 (still untested)

+  extPort = response[response_len - 2] << 8) |

A bracket is missing before response.

I've tested it with antinat and SSH's SOCK server and found no regression. My SSH (openssh-5.1p1-3.fc10.x86_64) connected through proxy as described in comment #1 doesn't return response with FQDN but also IPv4 with both ip and port zeroed. I've build my own patched version of antinat to test this potential problem and it works correctly. I've also tested "byte by byte" relaying (bug #353040) and it also works.
Thanks for testing this patch.  I'm carrying forward the r+ from the 
previous one (I understand that's OK to do), and requesting checkin.
Attachment #342755 - Attachment is obsolete: true
Attachment #364261 - Attachment is obsolete: true
Attachment #364525 - Flags: review+
Flags: wanted1.9.1?
Flags: wanted1.9.0.x+
Flags: blocking1.9.1?
Flags: blocking1.9.0.8?
Not blocking 1.9.0 but wanted. Since we have no testcase and no testing infrastructure for this taking a branch patch without broader trunk testing is a bit scary.
Flags: blocking1.9.0.8? → blocking1.9.0.8-
Flags: in-testsuite?
I've set checkin-needed on this bug.  I'm not sure of all the remaining 
protocol steps required to get the checkin done.  If it needs to be checked
in to Hg first, someone else will have to do that.
Doesn't block, but once baked please nominate for approval.
Flags: blocking1.9.1? → blocking1.9.1-
Closed: 16 years ago
Flags: wanted1.9.1? → wanted1.9.1+
Resolution: --- → FIXED
Attachment #364525 - Flags: approval1.9.1?
Comment on attachment 364525 [details] [diff] [review]
patch v3 - tested

It's baked, requesting approval.
Attachment #364525 - Flags: approval1.9.0.9?
Attachment #364525 - Flags: approval1.9.1? → approval1.9.1+
bug 487967 comment 9 implies that we should try backing this one out first
Yep -- can today's sheriff do so?

(Maybe it could reland with tests, too. :-( )
Yeah, I can back it out.

This is a tiny regression in absolute terms (<2ms) so I'm a little worried about the noise floor, but on the graph it's visible, so I'm hoping the backout effect will be similarly evident.

Cancelling the fixed1.9.1 keyword instead of re-opening, since this is still landed on trunk (where it doesn't seem to have caused any Tp blip at all:,395020,395048,912148,1431867&sel=1239089415,1239637970 )
Keywords: fixed1.9.1
Looks like the backout did bring numbers down, though again, it's noisy.[{%22test%22:%221%22,%22branch%22:%223%22,%22machine%22:%2235%22},{%22test%22:%222%22,%22branch%22:%223%22,%22machine%22:%2232%22},{%22test%22:%222%22,%22branch%22:%223%22,%22machine%22:%2233%22},{%22test%22:%222%22,%22branch%22:%223%22,%22machine%22:%2234%22},{%22test%22:%2211%22,%22branch%22:%223%22,%22machine%22:%2248%22}]

There's all the XP boxes doing Tp (including a nochrome and a tp_fast box). If you look at 4/14, around 10:30AM, you can see the drop. NoChrome, for instance, drops from 150.58 to 147.93.  Unfortunately it looks like something else regressed Tp not long afterwards (and then maybe was backed out or counterbalanced by a win?)

It's a mess, but I do think the data supports the theory that this patch has a Tp hit on XP.

Is that expected and understood and worth it, or mystical?
Absolutely mystical: I'm almost positive our tests don't hit SOCKS proxies at all, which would mean this code isn't even run.
I imagined that these page load time results must be using proxies, 
because why else would socks proxy code be getting called?  

If this code is not being called and it affects page load times, then it does 
border on mystical.  But I've seen situations where the mere insertion of a 
single line of code causes other code to be moved in the process address space, such that new and frequent collisions occur in the instruction cache. On some 
platforms, such as Solaris, there are optimization techniques that can greatly 
reduce i-cache collisions. I'm not aware of any for Windows though.
Whiteboard: [sg:low] → [sg:low] needs 1.9.1 landing
Attachment #364525 - Flags: approval1.9.0.10? → approval1.9.0.11?
Comment on attachment 364525 [details] [diff] [review]
patch v3 - tested

We can try again for 3.5.1, but right now it's time to minimize churn for 3.5 so revoking approval.
Attachment #364525 - Flags: approval1.9.1+ → approval1.9.1-
Flags: wanted1.9.1.x+
Whiteboard: [sg:low] needs 1.9.1 landing → [sg:low]
Comment on attachment 364525 [details] [diff] [review]
patch v3 - tested

Approved for, a=dveditz for release-drivers
Attachment #364525 - Flags: approval1.9.0.12? → approval1.9.0.12+
Dan, please commit the patch in the 1.9.0 repository.  Thanks.
Keywords: checkin-needed
Whiteboard: [sg:low] → [sg:low][needs checkin on 1.9.0]
Checking in netwerk/socket/base/nsSOCKSIOLayer.cpp;
/cvsroot/mozilla/netwerk/socket/base/nsSOCKSIOLayer.cpp,v  <--  nsSOCKSIOLayer.cpp
new revision: 1.21; previous revision: 1.20
Whiteboard: [sg:low][needs checkin on 1.9.0] → [sg:low]
Fixed this in, so we should fix it in
Flags: blocking1.9.1.1+
Attachment #364525 - Flags: approval1.9.1.1+
Comment on attachment 364525 [details] [diff] [review]
patch v3 - tested

Approved for a=ss

Dan, can you please land this on as well?
blocking1.9.1: --- → .2+
Flags: blocking1.9.1.1+ → blocking1.9.1.1-
Comment on attachment 364525 [details] [diff] [review]
patch v3 - tested

Let's make that :)
Attachment #364525 - Flags: approval1.9.1.2+
Attachment #364525 - Flags: approval1.9.1.1-
Attachment #364525 - Flags: approval1.9.1.1+
Dan: can we get this landed on mozilla-1.9.1 ASAP, please?
(In reply to comment #36)
> Dan: can we get this landed on mozilla-1.9.1 ASAP, please?
What is the best/simplest way for QA to verify this bug?
IMO, testing this requires having a SOCKS5 server that can be used to 
inject malicious (or simply out of spec) replies.  

Perhaps the reporter can help?
Flags: wanted1.8.1.x+
Group: core-security
(In reply to comment #38)
> What is the best/simplest way for QA to verify this bug?

As I mentioned in comment #14 I've used modified SOCKS server Antinat to test this bug. I can temporarily set it up on my server for verification it you want.
Flags: →
You need to log in before you can comment on or make changes to this bug.