Last Comment Bug 459524 - SOCKS5 reply with DNS name longer than 15 characters corrupts data
: SOCKS5 reply with DNS name longer than 15 characters corrupts data
: fixed1.9.0.12
Product: Core
Classification: Components
Component: Networking (show other bugs)
: 1.9.0 Branch
: All All
: -- minor (vote)
: ---
Assigned To: Nelson Bolyard (seldom reads bugmail)
: Patrick McManus [:mcmanus]
Depends on:
  Show dependency treegraph
Reported: 2008-10-11 15:38 PDT by Andrej Andolsek
Modified: 2009-12-16 16:20 PST (History)
11 users (show)
mbeltzner: blocking1.9.1-
benjamin: wanted1.9.1+
mbeltzner: blocking1.9.1.1-
samuel.sidler+old: wanted1.9.1.x+
dveditz: blocking1.9.0.9-
dveditz: wanted1.9.0.x+
dveditz: wanted1.8.1.x+
samuel.sidler+old: in‑testsuite?
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

proposed patch (untested) v1 (6.74 KB, patch)
2008-10-11 21:00 PDT, Nelson Bolyard (seldom reads bugmail)
cbiesinger: review+
cbiesinger: superreview+
Details | Diff | Splinter Review
patch v2 (still untested) (6.03 KB, patch)
2009-02-25 22:47 PST, Nelson Bolyard (seldom reads bugmail)
no flags Details | Diff | Splinter Review
patch v3 - tested (6.04 KB, patch)
2009-02-27 08:05 PST, Nelson Bolyard (seldom reads bugmail)
nelson: review+
mbeltzner: approval1.9.1-
mbeltzner: approval1.9.1.1-
mbeltzner: approval1.9.1.2+
dveditz: approval1.9.0.12+
Details | Diff | Splinter Review

Description Andrej Andolsek 2008-10-11 15:38:06 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/2008092417 Firefox/3.0.3
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/2008092417 Firefox/3.0.3

If Firefox receives SOCKS5 reply with DNS name longer than 15 characters then the subsequent data transfer will be corrupted. This will result with either pages being corrupted at the beginning, or with a Save As dialog being opened.

This quite possibly is a buffer overrun bug! And quite possibly it's an exploitable one too (that's why I'm classifying it as critical bug). It looks like Firefox can handle SOCKS5 replies up to 22 bytes in size - 22 bytes is needed for SOCKS5 reply holding IPv6 address. SOCKS5 reply with 15 character DNS name takes 22 bytes as well. Btw the maximum SOCKS5 reply is 262 bytes.

Reproducible: Always

Steps to Reproduce:
Comment 1 Andrej Andolsek 2008-10-11 16:39:22 PDT
I've checked the code and it it's not a buffer overrun bug. In netwerk\socket\base\nsSOCKSIOLayer.cpp - function ConnectSOCKS5():

     case 0x03: // FQDN (should not get this back)
        default: // unknown format
            // if we get here, we don't know our external address.
            // however, as that's possibly not critical to the user,
            // we let it slide.
            PR_InitializeNetAddr(PR_IpAddrNull, 0, extAddr);
            //return NS_ERROR_FAILURE;

0x03 - DOMAINNAME can indeed be sent back, but it is highly unusual. 

Let me explain how this can happen. Tunnelier is an SSH client that uses SOCKS proxy forwarding. Tunnelier acts as SOCKS proxy server, but connections to actual targets are made by SSH server. So when generating SOCKS5 reply Tunnelier must use the address of SSH server. Now comes the tricky part. If Tunnelier itself is connected through (regular) proxy server, and Tunnelier is configured not to resolve DNS name locally, then Tunnelier doesn't know the IP address of SSH server. Only in this case will Tunnelier use DNS name in SOCKS5 reply.

I propose the following code changes:
1) case 0x03 should simply skip bytes remaining - response_len = 4 + 1 + response[4] + 2
2) default should yield failure, because you don't know how much bytes there is remaining in SOCKS5 reply.

Also I got a couple of more questions:
1) How come the code is happily reading response bytes without checking how much bytes it's been actually read? (What if PR_Recv() read only 1 byte?)
2) What if 10 byte SOCKS5 reply is returned (IPv4) and PR_Recv() reads whole 22 bytes? Well OK, HTTP server waits on HTTP request before sending anything, so that's why this works - but it doesn't look clean.
Comment 2 Nelson Bolyard (seldom reads bugmail) 2008-10-11 19:37:22 PDT
I agree there's no buffer overrun here.  
If the socks response is larger that the largest size this code expects,
the excess response is left unread in the kernel's socket receive buffers,
where it will appear as a prefix to the next set of expected received data
(e.g. to the subsequent http response).  

There could be an underrun.  A "short read" could fail to read the entire
response into the reader's buffer before trying to parse it.  The actual 
amount of data read in is ignored, other than to assure it's not -1.
The code should deal with "short reads" properly.

Remember, http isn't the only protocol used over socks, and some protocols
are server-speaks-first, so reading in more data into the response buffer
than are actually contained in the socks response could cause some of 
the initial data from the desired remote server to be read into the socks
response buffer and then discarded.  

The right way to do this (IMO) is to initially read in 4 bytes, the header of 
the socks response, then interpret the 4th byte to see the type of response
and its proper length.  If that 4th byte is 03, then read in the 5th byte
for the length.  Finally read in the rest of the response according to the
length determined.  Deal with "short reads" at every step.  This ensures
that no underruns occur, and we don't read in any of the target server's
data by assuming it to be part of the socks response.
Comment 3 Nelson Bolyard (seldom reads bugmail) 2008-10-11 21:00:23 PDT
Created attachment 342755 [details] [diff] [review]
proposed patch (untested) v1

Reed, you've worked on this code before.  Please review
Comment 4 Nelson Bolyard (seldom reads bugmail) 2008-10-11 21:03:32 PDT
Reed, please review the patch I attached to this bug.
Comment 5 Reed Loden [:reed] (use needinfo?) 2008-10-11 21:40:19 PDT
I think you're probably wanting biesi.
Comment 6 Reed Loden [:reed] (use needinfo?) 2008-10-11 21:41:12 PDT
I apparently fail at moving bugs. Sorry about the mess.
Comment 7 Christian :Biesinger (don't email me, ping me on IRC) 2008-10-25 05:33:46 PDT
hm, bug 353040 seems to have a pretty similar patch...
Comment 8 Michal Novotny (:michal) 2008-10-29 07:27:59 PDT
(In reply to comment #7)
> hm, bug 353040 seems to have a pretty similar patch...

The patches are basically the same. There is a difference in timeout handling.

             PR_InitializeNetAddr(PR_IpAddrNull, 0, extAddr);
-            //return NS_ERROR_FAILURE;
+            extPort = response[response_len - 2] << 8) | 
+	              response[response_len - 1];

This is unnecessary since extPort isn't used anywhere.
Comment 9 Nelson Bolyard (seldom reads bugmail) 2008-10-29 11:07:23 PDT
The patches are similar, but don't address the same set of issues exactly.
Michal's also attempts to address correcting timeout computations, which
seems good.  Mine also attempts to address a couple issues in parsing the 
SOCKS5 response while not increasing the number of PR_Recv calls more than

Michal spotted a bug in my patch.  extPort SHOULD be used.  It should be 
computed before the call to PR_InitializeNetAddr, and the value should be
used in that call.  e.g. 

-            PR_InitializeNetAddr(PR_IpAddrNull, 0, extAddr);
-            //return NS_ERROR_FAILURE;
+            extPort = response[response_len - 2] << 8) | 
+                  response[response_len - 1];
+            PR_InitializeNetAddr(PR_IpAddrNull, extPort, extAddr);

This should be done for the FQDN case, only, as my patch does.
Comment 10 Robert Sayre 2009-01-04 15:46:35 PST
Comment on attachment 342755 [details] [diff] [review]
proposed patch (untested) v1

Let's not let this sg bug sit. Is this attachment ready for review? We can find someone else to look at it if need be.
Comment 11 Christian :Biesinger (don't email me, ping me on IRC) 2009-02-24 10:47:32 PST
Comment on attachment 342755 [details] [diff] [review]
proposed patch (untested) v1

r+sr=biesi, sorry for the delay. do make the change from comment 9 before checking this in.
Comment 12 Nelson Bolyard (seldom reads bugmail) 2009-02-25 22:47:25 PST
Created attachment 364261 [details] [diff] [review]
patch v2 (still untested)

This patch is v1 plus the changes I suggested previously. 

Do we have any one who can test this with a real socks server to ensure
no regressions?  My employer has stopped using socks servers and I no 
longer have any with which to test.
Comment 13 Michal Novotny (:michal) 2009-02-26 03:05:48 PST
(In reply to comment #12)

I'll try to test it.
Comment 14 Michal Novotny (:michal) 2009-02-27 04:50:51 PST
(In reply to comment #12)
> Created an attachment (id=364261) [details]
> patch v2 (still untested)

+  extPort = response[response_len - 2] << 8) |

A bracket is missing before response.

I've tested it with antinat and SSH's SOCK server and found no regression. My SSH (openssh-5.1p1-3.fc10.x86_64) connected through proxy as described in comment #1 doesn't return response with FQDN but also IPv4 with both ip and port zeroed. I've build my own patched version of antinat to test this potential problem and it works correctly. I've also tested "byte by byte" relaying (bug #353040) and it also works.
Comment 15 Nelson Bolyard (seldom reads bugmail) 2009-02-27 08:05:22 PST
Created attachment 364525 [details] [diff] [review]
patch v3 - tested

Thanks for testing this patch.  I'm carrying forward the r+ from the 
previous one (I understand that's OK to do), and requesting checkin.
Comment 16 Daniel Veditz [:dveditz] 2009-03-06 11:13:15 PST
Not blocking 1.9.0 but wanted. Since we have no testcase and no testing infrastructure for this taking a branch patch without broader trunk testing is a bit scary.
Comment 17 Nelson Bolyard (seldom reads bugmail) 2009-03-06 13:53:10 PST
I've set checkin-needed on this bug.  I'm not sure of all the remaining 
protocol steps required to get the checkin done.  If it needs to be checked
in to Hg first, someone else will have to do that.
Comment 18 Mike Beltzner [:beltzner, not reading bugmail] 2009-03-10 12:11:19 PDT
Doesn't block, but once baked please nominate for approval.
Comment 19 Benjamin Smedberg [:bsmedberg] 2009-03-10 13:19:12 PDT
Comment 20 Daniel Veditz [:dveditz] 2009-03-21 11:05:00 PDT
Comment on attachment 364525 [details] [diff] [review]
patch v3 - tested

It's baked, requesting approval.
Comment 21 Benjamin Smedberg [:bsmedberg] 2009-04-13 12:55:51 PDT
Comment 22 Mike Beltzner [:beltzner, not reading bugmail] 2009-04-13 21:22:13 PDT
This is one of two bugs (along with bug 487967) associated with a Tp3 regression on XP:,2419206,2417619&sel=1239471988,1239817588
Comment 23 Mike Beltzner [:beltzner, not reading bugmail] 2009-04-14 08:04:53 PDT
bug 487967 comment 9 implies that we should try backing this one out first
Comment 24 Mike Shaver (:shaver -- probably not reading bugmail closely) 2009-04-14 08:14:03 PDT
Yep -- can today's sheriff do so?

(Maybe it could reland with tests, too. :-( )
Comment 25 Johnathan Nightingale [:johnath] 2009-04-14 09:22:14 PDT
Yeah, I can back it out.

This is a tiny regression in absolute terms (<2ms) so I'm a little worried about the noise floor, but on the graph it's visible, so I'm hoping the backout effect will be similarly evident.

Cancelling the fixed1.9.1 keyword instead of re-opening, since this is still landed on trunk (where it doesn't seem to have caused any Tp blip at all:,395020,395048,912148,1431867&sel=1239089415,1239637970 )
Comment 26 Johnathan Nightingale [:johnath] 2009-04-16 09:22:25 PDT
Looks like the backout did bring numbers down, though again, it's noisy.[{%22test%22:%221%22,%22branch%22:%223%22,%22machine%22:%2235%22},{%22test%22:%222%22,%22branch%22:%223%22,%22machine%22:%2232%22},{%22test%22:%222%22,%22branch%22:%223%22,%22machine%22:%2233%22},{%22test%22:%222%22,%22branch%22:%223%22,%22machine%22:%2234%22},{%22test%22:%2211%22,%22branch%22:%223%22,%22machine%22:%2248%22}]

There's all the XP boxes doing Tp (including a nochrome and a tp_fast box). If you look at 4/14, around 10:30AM, you can see the drop. NoChrome, for instance, drops from 150.58 to 147.93.  Unfortunately it looks like something else regressed Tp not long afterwards (and then maybe was backed out or counterbalanced by a win?)

It's a mess, but I do think the data supports the theory that this patch has a Tp hit on XP.

Is that expected and understood and worth it, or mystical?
Comment 27 Benjamin Smedberg [:bsmedberg] 2009-04-16 09:35:20 PDT
Absolutely mystical: I'm almost positive our tests don't hit SOCKS proxies at all, which would mean this code isn't even run.
Comment 28 Nelson Bolyard (seldom reads bugmail) 2009-04-16 09:51:34 PDT
I imagined that these page load time results must be using proxies, 
because why else would socks proxy code be getting called?  

If this code is not being called and it affects page load times, then it does 
border on mystical.  But I've seen situations where the mere insertion of a 
single line of code causes other code to be moved in the process address space, such that new and frequent collisions occur in the instruction cache. On some 
platforms, such as Solaris, there are optimization techniques that can greatly 
reduce i-cache collisions. I'm not aware of any for Windows though.
Comment 29 Mike Beltzner [:beltzner, not reading bugmail] 2009-05-27 15:42:37 PDT
Comment on attachment 364525 [details] [diff] [review]
patch v3 - tested

We can try again for 3.5.1, but right now it's time to minimize churn for 3.5 so revoking approval.
Comment 30 Daniel Veditz [:dveditz] 2009-06-03 16:00:59 PDT
Comment on attachment 364525 [details] [diff] [review]
patch v3 - tested

Approved for, a=dveditz for release-drivers
Comment 31 Nelson Bolyard (seldom reads bugmail) 2009-06-03 16:28:57 PDT
Dan, please commit the patch in the 1.9.0 repository.  Thanks.
Comment 32 Daniel Veditz [:dveditz] 2009-06-25 14:26:03 PDT
Checking in netwerk/socket/base/nsSOCKSIOLayer.cpp;
/cvsroot/mozilla/netwerk/socket/base/nsSOCKSIOLayer.cpp,v  <--  nsSOCKSIOLayer.cpp
new revision: 1.21; previous revision: 1.20
Comment 33 Samuel Sidler (old account; do not CC) 2009-07-06 20:24:06 PDT
Fixed this in, so we should fix it in
Comment 34 Samuel Sidler (old account; do not CC) 2009-07-13 14:58:13 PDT
Comment on attachment 364525 [details] [diff] [review]
patch v3 - tested

Approved for a=ss

Dan, can you please land this on as well?
Comment 35 Mike Beltzner [:beltzner, not reading bugmail] 2009-07-14 13:30:22 PDT
Comment on attachment 364525 [details] [diff] [review]
patch v3 - tested

Let's make that :)
Comment 36 Mike Beltzner [:beltzner, not reading bugmail] 2009-07-21 17:16:04 PDT
Dan: can we get this landed on mozilla-1.9.1 ASAP, please?
Comment 37 Reed Loden [:reed] (use needinfo?) 2009-07-21 17:30:07 PDT
(In reply to comment #36)
> Dan: can we get this landed on mozilla-1.9.1 ASAP, please?
Comment 38 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2009-07-31 10:47:24 PDT
What is the best/simplest way for QA to verify this bug?
Comment 39 Nelson Bolyard (seldom reads bugmail) 2009-07-31 11:19:14 PDT
IMO, testing this requires having a SOCKS5 server that can be used to 
inject malicious (or simply out of spec) replies.  

Perhaps the reporter can help?
Comment 40 Michal Novotny (:michal) 2009-10-06 16:24:04 PDT
(In reply to comment #38)
> What is the best/simplest way for QA to verify this bug?

As I mentioned in comment #14 I've used modified SOCKS server Antinat to test this bug. I can temporarily set it up on my server for verification it you want.

Note You need to log in before you can comment on or make changes to this bug.