Closed Bug 280661 Opened 20 years ago Closed 14 years ago

SOCKS proxy server connection timeout hard-coded

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla5

People

(Reporter: snabb, Assigned: michal)

References

Details

Attachments

(2 files, 8 obsolete files)

User-Agent:       Opera/7.54 (X11; FreeBSD i386; U)  [en]
Build Identifier: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.5) Gecko/20050109

I am using Mozilla in a non-typical network which requires using a SOCKS server 
which is located behind a high-latency network connection. Unfortunately the 
connection timeout of the SOCKS server conenction is currently hard-coded (ugly!
) in the source code to be 10 seconds. This means that my connection to the 
SOCKS server always times out (unless I patch it in source code before 
compiling).

My suggestion is to make this configurable in about:config. It should be very 
easy to fix for anyone who knows Mozilla configuration mechanism (I don't).


Reproducible: Always

Steps to Reproduce:
1. use Mozilla with high-latency SOCKS server (connection latency > 10 s)
2. try connecting to any web site
3.

Actual Results:  
The connection to the SOCKS server times out virtually always.


Expected Results:  
I should be able to configure the timeout to accomodate my network connection 
timeout requirements.

I am currently using the following source code patch, which pinpoints the hard-
coded timeout setting. Unfortunately I always forget to apply the patch when 
upgrading Mozilla, which means that I always have to compile twice :).

--- mozilla/netwerk/socket/base/nsSOCKSIOLayer.cpp.dist Tue Feb  1 15:55:51 2005
+++ mozilla/netwerk/socket/base/nsSOCKSIOLayer.cpp      Tue Feb  1 15:56:13 2005
@@ -534,11 +534,11 @@
     // uses blocking network calls, the app can appear to hang for a maximum
     // of this time if the user presses the STOP button during the SOCKS
     // connection negotiation. Note that this value only applies to the 
     // connecting to the SOCKS server: once the SOCKS connection has been
     // established, the value is not used anywhere else.
-    PRIntervalTime connectWait = PR_SecondsToInterval(10);
+    PRIntervalTime connectWait = PR_SecondsToInterval(60);
 
     // Connect to the proxy server.
     status = fd->lower->methods->connect(fd->lower, &proxyAddr, connectWait);
     
     if (PR_SUCCESS != status) {
Assignee: general → darin
Component: General → Networking
Product: Mozilla Application Suite → Core
QA Contact: general → benc
Version: unspecified → Trunk
*** Bug 280665 has been marked as a duplicate of this bug. ***
Darin:

As I recall, your design philosophy was that we should use the OS-level
timeouts, and remove the application-coded timeouts.

Did this one slip through, or was there a reason it is in there?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: helpwanted
The patch doesn't seem like a good idea, because that timeout blocks further network activity.  (Thankfully the UI is still responsive.)

A SOCKS proxy doesn't return a SOCKS connect response until the proxy successfully establishes a connection to the remote site.  That can take up to the tcp connect timeout on the SOCKS machine, and Firefox should let it take that long without blocking.

One way to deal with it would be to return success from that function (nsSOCKSIOLayer.cpp 361, iirc) after the connect request is sent.  Then there could be a flag at a higher layer to indicate whether the SOCKS connect response has been received.  When mozilla receives data from a tcp connection, if Firefox uses a SOCKS proxy and the flag is unset, then Firefox would verify that the buffer is the expected SOCKS connect response, and would set the flag.

Alternatively, is there a way to keep the entire SOCKS connect process from blocking as it does?  Non-SOCKS tcp connections don't block (you can click "Stop" and immediately connect to another site)... so why do SOCKS connections block (you can click "Stop", but attempting to connect to another site is delayed while the SOCKS timeout expires... extremely annoying if the timeout is increased to 60 seconds)?

I ran into this issue using Tor (which behaves like a socks proxy).  10 seconds is not nearly enough for connections to many .onion sites.  Standard practice is to use Firefox->Privoxy->Tor.  Unfortunately, I don't think privoxy supports pipelining, or else it has some other problem that makes it rather slow even for Tor network standards.  Direct Firefox->Tor is faster, at least when it doesn't timeout, so it would be nice if this issue could be resolved.
Assignee: darin → nobody
QA Contact: benc → networking
I am experiencing difficulties with this bug also. At this point, a badly thought out patch is better than no connectivity, and this bug is one of the sole reasons why I currently don't advertise Firefox for corporate use. Not only does it prohibit me personally from being able to work with firefox, but in other corporate and SOCKS environments, many other people could be having this problem. I know for a fact that this problem is one of the sole reasons that Tor cannot operate well through Firefox.

I'd like to ask anyone that can to at least give us something so we can use SOCKS proxies with high latency with Firefox. I tried creating a patch myself, but I ended up failing miserably. Hopefully this can be resolved, and Firefox will rule once again! :D

Thanks in advance for anyone who helps!
I think the best way to get rid of the hard coded 10 second timeout is to rewrite the SOCKS code to work with non-blocking I/O, so the event loop isn't halted by blocking I/O while the SOCKS handshake completes.

NSPR already supports non-blocking connections with plain sockets with PR_Connect() and PR_ConnectContinue(). The SOCKS I/O layer could use the socket in non-blocking mode, and keep track of the state of the handshake in the nsSOCKSSocketInfo class. PR_Connect() would be used to start the handshake and PR_ConnectContinue() to advance the state of the handshake when I/O events are discovered down the line.

The only thing that's missing is some way to notify the transport what sort of events to poll for. For PR_Connect() and PR_ConnectContinue(), it is assumed only write events should be polled for, which is fine for plain non-blocking connections, but SOCKS also needs to signal that it wants to read from the socket.  Perhaps we could take a hint from OpenSSL and add PR_WANT_READ_ERROR and PR_WANT_WRITE_ERROR error codes and change the definition of PR_Connect() and PR_ConnectContinue() accordingly?
Wan-Teh Chang,

Do the proposed changes to NSPR in comment #5 sound OK?

Chris,

The non-blocking approach for SOCKS sounds promising (though I'll confess I don't know SOCKS well enough to know definitively).  Let's see if the NSPR folks agree to the change, and if they do, we can certainly try this out if someone's willing to contribute a patch.
Jason,

This is already handled by the "poll" method of a PRFileDesc.
It's equivalent to OpenSSL's use of WANT_READ_ERROR and
WANT_WRITE_ERROR.

The SOCKS I/O layer can define the "poll" method to tell the
caller what I/O events we should poll for at the lowest level
during connection setup.
(In reply to comment #7)
> The SOCKS I/O layer can define the "poll" method to tell the
> caller what I/O events we should poll for at the lowest level
> during connection setup.

Thanks for mentioning this. I didn't notice this before.

So while normally the poll flags are set to PR_POLL_EXCEPT | PR_POLL_WRITE during non-blocking connection, the SOCKS layer's implementation of the poll method may override this so read or write events are polled instead?
Yes, that's exactly what the poll method is intended for.
Ok, I'd like to write a patch for this. I'll do it pretty much as mentioned in comment 5, although with the poll method taken into account and no additional error codes.
Chris--great, go for it.
This patch is a draft of of the non-blocking SOCKS handshake. I've tested it on Linux with Tor and SSH with dynamic port forwarding, and it seems to work OK. 

The only problem I've run into so far is with error handling. There's trouble distinguishing between errors connecting to the proxy and errors that the proxy reports back when it rejects a connection request. In both cases, the error is set to NS_ERROR_PROXY_CONNECTION_REFUSED in nsSocketTransport2.cpp, and the user gets "The proxy server is refusing connections", even if the proxy server is online and correctly reporting that it had trouble connecting to a site.

Any thoughts about the patch or the error handling issue?
This patch improves error handling. When a SOCKS connection request is rejected by the proxy, the error is recorded in the SOCKS layer (rather than globally), and it is reported with an extra call to PR_ConnectContinue(). A slight modification is made to nsSocketTransport2.cpp to extract the handshake error. This change helps to prevent Firefox from falsely reporting to the user that the proxy server is rejecting connections when the proxy server is actually online.

Running the netwerk unit tests reports no regressions on Linux (for me), and connecting through Tor and SSH seems to work.
Attachment #408453 - Attachment is obsolete: true
Attachment #410350 - Flags: review?(cbiesinger)
Attachment #410350 - Flags: review?(cbiesinger)
Attachment #410350 - Flags: review?(cbiesinger)
Can we get someone to review this latest patch? It's been quite a while. With this patch, the Tor project could greatly simplify its bundles, possibly even shipping an all-in-one xpi with just Torbutton+tor.
Tested the patch on Linux and it works. Thanks chrisd
Hi all!!!!!!
Yeah, maybe i had to say this before, but i'm using this patch in my custom build of firefox i made for my Tor Browser Bundle for Linux!!!!!!!!
I also suggest you to merge this patch with the main source trunk!!!!!!!!! It's very useful and it works very well!!!!!!! It has been tested for a whole month so far!!!!! Chris did a very good work!!!!!!!!!!!!!!!

bye!!!!!!!!!
~bee!!!!!!
http://honeybeenet.altervista.org/factorbee/
The latest patch works fine on FreeBSD, I've been using it for over a week now and I didn't encounter any issue whatsoever.

It took 5 years, but I can finally use tor directly as a socks proxy on firefox.

Great job guys! :)
Comment on attachment 410350 [details] [diff] [review]
non-blocking SOCKS handshake with improved error handling.

Jason, can you review this, or do we have someone else who can?  Christian is ... laggy.
Attachment #410350 - Flags: review?(jduell.mcbugs)
Glad to hear from the testers... Yes, it would be good to get this reviewed.
Using OpenBSD 4.7-ish and applied patch to Firefox 3.5.9: it works flawlessy.
It would be very helpful if someone could review the patch and commit it to trunk...

--
baiton
Comment on attachment 410350 [details] [diff] [review]
non-blocking SOCKS handshake with improved error handling.

I don't know SOCKS yet, and this isn't a small patch, so there's no way I'll get to this before e10s ships.  Biesi is also likely to be busy with other things.  Do we have anyone who knows SOCKS well enough to do a +r on this?
Attachment #410350 - Flags: review?(jduell.mcbugs)
Honza?
I think it greatly increases chances of quick acceptance of patches if you can demonstrate that your patch is free of regressions.

Are there any tests in our test suites that test the correctness of SOCKS proxy operation?

If we currently don't have any automated tests for SOCKS operation, it would be a great contribution to add such tests, and demonstrate the behaviour with and without your new patch is as expected.

Having automated tests probably relies on having an appropriate test proxy available. Is there a public one that could be used, maybe a proxy that gets configured to accept requests from inside the Mozilla network? Or could a proxy automatically be started as part of the test suite procedure?
The last time I checked, there wasn't a test for SOCKS. I'd be willing to write one if we can work out what server to use... I suppose that a SOCKS server could be written specifically for testing, if we really needed that. OpenSSH client also has a SOCKS server for dynamic port forwarding.
It would be really great to have some SOCKS test coverage.  It would be best to have a server as part of our source tree (ideally in a scripting language), which we could reliably count on to run.  But failing that, even some tests that we could run against a well-known or configurable server would be progress.   

I'm wondering if there might be some way to make this work so that such tests could be run in our automated tryserver/tinderbox builds. (Can we set up an "always available" socks server somewhere?). Cc-ing some people who are more likely to know than me. 

But even if full automation isn't possible, manual testing would be a vast improvement, so feel free to get started on writing some tests, while we sort out whether and how we could provide the server end.
Presumably you could write a SOCKS server in JS, and write xpcshell tests that go xpcshell -> SOCKS -> httpd.js.

I don't know how fully-featured your server would have to be in order for it to be usable by our network stack.
That should work(In reply to comment #26)
> Presumably you could write a SOCKS server in JS, and write xpcshell tests that
> go xpcshell -> SOCKS -> httpd.js.
> 
> I don't know how fully-featured your server would have to be in order for it to
> be usable by our network stack.

That should work. The SOCKS server for testing shouldn't need to be too complicated.
I've started off writing a JS "SOCKS echo server" that simply accepts client connections, handles the SOCKS handshake, and then relays any data the client sends back to the client. That's a little easier than having the SOCKS server make an outgoing connection to httpd.

It might be a bit hard to get the unpatched SOCKS client to run in the same thread as the server, because the client doesn't play nice with the event loop. The SOCKS server may need to be in a separate thread or process. The patched client works OK here, though.
If need be, you can run a new xpcshell as a subprocess from an xpcshell test. I do that for some crashreporter tests (because I want to crash the subprocess). You can see that code here:
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/test/unit/head_crashreporter.js#29

You'd want to run the process without blocking on it, though, see the docs here:
https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsIProcess

and obviously you'll need to kill the process before your test exits.
Attachment #410350 - Attachment is obsolete: true
Attachment #410350 - Flags: review?(cbiesinger)
Some of the modified files where moved around, so the patch has been updated with the new locations.
Attached patch SOCKS xpcshell unit test (obsolete) — Splinter Review
Tests the SOCKS I/O layer; works with new and old versions.
Took Ted's advice for creating a subprocess in the unit test; it seems to work alright with both the old version of the SOCKS layer and the yet-to-be-committed non-blocking version.

Anyone have comments on the unit test? And is there anything else I can do to encourage Mozilla folks to get this stuff committed?
YEAH!!!!! thats super!!!!!!!!!!!!!!!!!!!
Chris is a very good coder!!!!! He's the only one doing something of useful for this bug report!!!!!!!!!!
I believe nobody at googlezilla is helping him!!! is it?!!!!! why?!!!!!
After all his efforts, you should at least push his patch into the main source!!!!!!!!! I wonder about what funny excuse you'll seek out to say him no again!!! hah!!!!

bye!!!!!!!!!
~bee!!!!!!
Comment on attachment 450585 [details] [diff] [review]
SOCKS xpcshell unit test

I'm putting this in my review queue so that I don't forget to take a look at it. Thanks for taking the time to write a test (even though you had to go out of your way and implement your own test proxy!) Having test coverage of your patch should make this a lot easier to land.
Attachment #450585 - Flags: review?(ted.mielczarek)
Bee:
We don't have an infinite supply of manpower, which is why we rely so heavily on contributions from people like Chris. We do have a process for accepting patches, which is that they need to be reviewed by a module peer. Chris' patch has not yet been reviewed, so it can't be landed. Hopefully we'll get someone to review it soon and get it landed, since Chris has indeed done some great work here.

Calling people names and casting aspersions is not a great way to convince people to work with you, however.
Assignee: nobody → chrisd.mang
Comment on attachment 450584 [details] [diff] [review]
Patch for non-blocking SOCKS I/O layer updated to reflect new source locations.

Jason, if you don't have time for this, could you redirect to someone else? I'll review the test changes Chris made.
Attachment #450584 - Flags: review?(jduell.mcbugs)
Status: NEW → ASSIGNED
It's working on NetBSD 5.0.2/Firefox 3.6.3 as well. Any hope to have the patch reviewed and committed to trunk?
Attachment #450584 - Flags: review?(jduell.mcbugs) → review?(michal.novotny)
The problem is that there aren't any SOCKS experts active in the project, so reviewing the patch requires someone to get up to speed on the SOCKS stuff first...

Michal, are you actively working on this?  If so, do you have anything like an ETA?  If not, do we need to find someone else to do it?
I think I can do it on Tuesday/Wednesday.
When will this non-blocking socket patch be applied to official release?
odakim: after it's reviewed, approved, committed, and released. we don't read tea leaves. just be patient. it could take 5 years, or it could be present in the next release, whose date we don't even know.
> odakim: after it's reviewed, approved, committed, and released.
> just be patient

After five years, in which step are we in?!!!!!! 
are we before the first one?!!!! lolol!!!!!!!!!!!!

Hopefully this patch will be applied within the current century!!!!!!!!!!!! but i ain't going to hold my breath!!!!
Perhaps somebody has to turn this patch into a "blocking" thing!!!!!! YEAH!! because atm, nobody at mozilla seems to care enough about it!!!

bye!!!!!!!!
~bee!!!!
A BUMP a day, keeps Mozilla awake!!
soget, please do not bump bugs. Bugzila is not a forum, and doing so will only get you banned for spamming. Also, do not post this bug on other, unrelated bugs.
Comment on attachment 450584 [details] [diff] [review]
Patch for non-blocking SOCKS I/O layer updated to reflect new source locations.

Chris, the patch looks good in general. Most of my comments are just about the coding style.


> +        rv = dns->Resolve(mProxyHost, 0, getter_AddRefs(mDnsRec));
> +        if (NS_FAILED(rv)) {
> +            LOGERROR(("socks: DNS lookup for SOCKS proxy %s failed",

Since our goal is to make connecting non-blocking it makes sense to use AsyncResolve() instead of blocking Resolve().

> +    PRInt32 IsConnected() const { return mState == SOCKS_CONNECTED; }

PRInt32 -> PRBool

> +const char *
> +nsSOCKSSocketInfo::StateToStr() const
> +{
> +    switch (mState) {
> +        case SOCKS_INITIAL:
> +            return "SOCKS_INITIAL";

Log the state only as a number and remove the StateToStr() method.

>  const nsCString &DestinationHost() { return mDestinationHost; }
>  const nsCString &ProxyHost()       { return mProxyHost; }
>  PRInt32          ProxyPort()       { return mProxyPort; }
>  PRInt32          Version()         { return mVersion; }
>  PRUint32         Flags()           { return mFlags; }

These methods are no longer used, remove them.

> +PRStatus
> +nsSOCKSSocketInfo::ContinueConnectingToProxy(PRFileDesc *fd, PRInt16 oflags)
> +{
> +    PRStatus status;
> +
> +    NS_ABORT_IF_FALSE(mState == SOCKS_CONNECTING_TO_PROXY,
> +                      "Continuing connection in wrong state!");

Checking the state here (and in other similar cases) doesn't make much sense, because the method is called only in the correct state. Please keep only meaningful checks and change them to NS_ASSERTION. Also note that these macros do nothing in release build. So any case that should be handled as error should really throw some error.

> +  PRInt16 HowToPoll() const { return mPollFlags; }

Please rename it to GetPollFlags() or just PollFlags()

> +  void PushUint8(PRUint8 d);
> +  void PushUint16(PRUint16 d);
> +  void PushUint32(PRUint32 d);

All PushUintXXX/PopUintXXX should be probably made as macros. Btw. all this Push/Pop concept makes the code hard to read (especially to verify the protocol correctness). Wouldn't simple direct buffer addressing be better in our case?

> +  PRStatus Fill(PRFileDesc *fd);
> +  PRStatus Flush(PRFileDesc *fd);

The names aren't self-explanatory, please rename it.

> +  // A connection failure occured, try another address
> +  mState = SOCKS_INITIAL;
> +  return ConnectToProxy(fd, PR_INTERVAL_NO_TIMEOUT);

Shouldn't we honor the timeout?

> +  PRStatus SendGreeting(PRFileDesc *fd);
> +  PRStatus ConnectToProxy(PRFileDesc *fd, PRIntervalTime to);
> +  PRStatus ContinueConnectingToProxy(PRFileDesc *fd, PRInt16 oflags);
> +  PRStatus HandleSocks4Connect(PRFileDesc *fd);
> +  PRStatus HandleSocks5Auth(PRFileDesc *fd);
> +  PRStatus HandleSocks5Connect(PRFileDesc *fd);

The state and method names are a little bit confusing. E.g. SendGreeting() sends connect request in case of SOCKS4 but method selection in case of SOCKS5. I'd prefer something like:

SendMethodRequest()
RecvMethodResponse()
SendConnectRequest()
RecvConnectResponse()
...

And the same for the states:

SOCKS_SEND_METHOD_REQUEST
...

> +  PRStatus HandleSocks5Connect(PRFileDesc *fd);

I'd prefer to split this method into 2. The first would read first 5 bytes and the second the rest.

> +  nsCString mInBuf;
> +  nsCString mOutBuf;

Using nsCString for this purpose isn't the best idea. E.g. following memory allocations happen in case of HandleSocks5Auth()

PushUint8(0x05); // alloc 2 bytes
PushUint8(0x01); // realloc to 3 bytes
PushUint8(0x00); // realloc to 5 bytes
PushUint8(0x01); 
PushNetAddr(addr); // realloc to 9 bytes
PushNetPort(addr);  // realloc to 17 bytes
Flush(); // Truncate() releases the buffer

It should be possible to minimize memory allocations by calling GetMutableData on mOutBuf and using mOutBuf == "" instead of mOutBuf.Truncate(), but this would be an ugly hack, so I would prefer to use a char[] array.

> +  // Flush all pending data for a SOCKS request, then read the
> +  // response. If we're still connecting to the SOCKS proxy and
> +  // there's no I/O to do yet, Flush() and Fill() will simply return
> +  // PR_SUCCESS

Just a suggestion: In each state we know whether if we want to read or write data. IMO it would be cleaner to either read or write according to the state instead of always trying to write and read all data. And then you could use just one buffer for both reading and writing.

> +  // A SOCKS request was rejected, so call PR_ConnectContinue() to instruct
> +  // the SOCKS I/O layer to set the actual error code.
> +  //
> +  else if (PR_INVALID_STATE_ERROR == code &&
> +           mProxyTransparent &&
> +           !mProxyHost.IsEmpty()) {
> +      PR_ConnectContinue(fd, 0);
> +      code = PR_GetError();
> +      rv = ErrorAccordingToNSPR(code);
> +  }

I'm afraid that this change to improve the error handling isn't acceptable, but I'll delegate this decision to the necko peer.
Attachment #450584 - Flags: review?(michal.novotny) → review-
Michal, thanks for looking through the patch!  I have a few questions/comments:

> PRInt32 -> PRBool

Or just bool.

> Checking the state here (and in other similar cases) doesn't make much sense,
> because the method is called only in the correct state. Please keep only
> meaningful checks and change them to NS_ASSERTION.

I'd say that checking assumed invariants that currently _aren't_ violated with NS_ABORT_IF_FALSE is exactly the right thing to do....  why wouldn't it be?  That's what NS_ABORT_IF_FALSE is for.

> so I would prefer to use a char[] array

How does that help the realloc situation?  If we just want a growable thing with some nonzero initial size, we have string classes for that... is that what we need here?

> I'm afraid that this change to improve the error handling isn't acceptable,

What problems does/could it cause?  (Genuine question; if a necko peer is supposed to make a call about this, it would be good to have all the relevant information.)
(In reply to comment #48)
> > Checking the state here (and in other similar cases) doesn't make much sense,
> > because the method is called only in the correct state. Please keep only
> > meaningful checks and change them to NS_ASSERTION.
> 
> I'd say that checking assumed invariants that currently _aren't_ violated with
> NS_ABORT_IF_FALSE is exactly the right thing to do....  why wouldn't it be? 
> That's what NS_ABORT_IF_FALSE is for.

OK, I just think that it isn't necessary to check the state at least in case of HandleSocks4Connect(), HandleSocks5Auth(), HandleSocks5Auth() and HandleSocks5Connect() which are called only from switch(mState) statement in DoHandshake().

> > so I would prefer to use a char[] array
> 
> How does that help the realloc situation?  If we just want a growable thing
> with some nonzero initial size, we have string classes for that... is that what
> we need here?

Of course I meant char[SOME_MAX_SIZE] with sufficient capacity. Maximum length of each request/reply is known.

> > I'm afraid that this change to improve the error handling isn't acceptable,
> 
> What problems does/could it cause?  (Genuine question; if a necko peer is
> supposed to make a call about this, it would be good to have all the relevant
> information.)

It isn't actually a problem. I just don't like too much using PR_INVALID_STATE_ERROR in the sense "ask me once again for the real error". I'm not sure what this error is normally used for and if there can be some conflict. But if it is OK to throw this error, then it can be done without the additional call to PR_ConnectContinue() and the SOCKS_REPORT_ERROR state. SOCKS code should throw a real error code and PR_INVALID_STATE_ERROR in case of fd->lower->methods->connect/connectcontinue failure. nsSocketTransport can then interpret this error code differently.
> OK, I just think that it isn't necessary to check the state

It's not a check.  It's an assertion, for in case someone changes the caller code...

> Maximum length of each request/reply is known.

Ah, ok.  Then yes, just using a fixed buffer of max length is great, esp if it's not too big.

For the error, the NSPR docs say:

254 /* Object state improper for request */
255 #define PR_INVALID_STATE_ERROR                   (-5931L)

I haven't read the patch in enough detail yet to see whether this makes sense with the patch's usage of the error code...
Comment on attachment 450585 [details] [diff] [review]
SOCKS xpcshell unit test

Looks good, just some minor comments.

>diff -r f21132993dc2 netwerk/test/unit/socks_client_subprocess.js
>--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
>+++ b/netwerk/test/unit/socks_client_subprocess.js	Thu Jun 10 18:23:36 2010 -0700
>+  var data = bin.readBytes(5);
>+  if (data == 'PING!') {
>+    print('client: got ping, sending pong.');
>+    output.write('PONG!', 5);
>+  } else {
>+    print('client: wrong data from server:', data);

I think you should write some data back to the client even in case of error. Maybe something like "Error: wrong data received".

>+for each (var arg in arguments) {
>+  print('client: running test', arg);
>+  test = arg.split(':');

var test = ...

>diff -r f21132993dc2 netwerk/test/unit/test_socks.js
>--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
>+++ b/netwerk/test/unit/test_socks.js	Thu Jun 10 18:23:36 2010 -0700
>@@ -0,0 +1,486 @@
>+const SOCKS_LISTEN_PORT = 9080;

It'd be nice if you could figure out a way to make this dynamic, so that the test won't fail on systems where this port is in use, but I wouldn't r- you just for that.
Attachment #450585 - Flags: review?(ted.mielczarek) → review+
Any progress?
I've been testing this patch on Ubuntu for months, using custom builds of Firefox. But it's getting annoying having to rebuild Firefox from scratch every time a new minor version is out... :(
Will this apply cleanly against 3.6.10?
As it is something that will be quite useful when done,
I'm interested in giving this a try on top of a local
ports build as it is nearing formal release, I hope :)
Tested on Firefox 3.5.13, it's working. Can we speed up things a bit? It's taking ages to have this bug fixed and committed to the main source tree.
This patch is now coming up on its first birthday, and has been around for at least an entire Firefox major release cycle.

Can we please get someone to review this and get it into the Firefox 4 and Fennec trees? I really don't see what the holdup is.

The Tor Project is using it in some of our custom Firefox builds, and others in the wider open source community are obviously testing and using it as well.
Flags: wanted-fennec1.0?
If I understand correctly, this is now waiting for someone to cleanup the patch as requested by reviewers in comment 47, 48, 49, 50, 51.
(In reply to comment #56)
> If I understand correctly, this is now waiting for someone to cleanup the patch
> as requested by reviewers in comment 47, 48, 49, 50, 51.

Usually the reviewers expect that the patch creator (Chris) cleans up the patch according to review comments.

Chris, would you able to get this done, or are you looking for help from someone else to do it?
(In reply to comment #57)
> (In reply to comment #56)
> > If I understand correctly, this is now waiting for someone to cleanup the patch
> > as requested by reviewers in comment 47, 48, 49, 50, 51.
> 
> Usually the reviewers expect that the patch creator (Chris) cleans up the patch
> according to review comments.
> 
> Chris, would you able to get this done, or are you looking for help from
> someone else to do it?

Yes, I'm working on it.
(In reply to comment #58)
> (In reply to comment #57)
> > (In reply to comment #56)
> > > If I understand correctly, this is now waiting for someone to cleanup the patch
> > > as requested by reviewers in comment 47, 48, 49, 50, 51.
> > 
> > Usually the reviewers expect that the patch creator (Chris) cleans up the patch
> > according to review comments.
> > 
> > Chris, would you able to get this done, or are you looking for help from
> > someone else to do it?
> 
> Yes, I'm working on it.

How is it going, Chris?
Do note that FF4 Beta 8 may freeze as soon as this week, and we are rapidly approaching the point where this patch won't be taken for Firefox 4 (if we're not already there). Perhaps bz or biesi should weigh in on if the risk here is worthwhile? Good to know Tor folks have been using and testing it, though!
If I fix up the issues from comment #47-51, can we land this in 4.0 for desktop and mobile? It seems Chris has disappeared, but I think I have time to do the final cleanup for this right now, if it will definitely get in.

Otherwise, I'll just continue updating Torbutton for FF 4.0 and mobile...
Please Mike, fix those issues. I think we are still in time for having non-blocking socks in FF4: after all FF4 Beta 8 hasn't been released yet and Chrisd's patch has been extensively tested by Tor users for seven months or so.
Mike, I'm willing to test the patch once you fix those issues and I do really hope we can have this almost-6-years-old bug fixed in Firefox 4.0.
Chris reappeared on our IRC channel yesterday is is working on the patch right now. He says a new version should be ready in a few hours.
(In reply to comment #61)
> If I fix up the issues from comment #47-51, can we land this in 4.0 for desktop
> and mobile?

Even if it doesn't land for 4.0 (worst case), the fixes would still need to be made to land for after 4.0; so better to get it done and waiting. :)

You've mentioned that this patch is already in use by Tor users, I presume that's with custom builds? Ballpark guess on number of users? Is Tor's SOCKS usage special in any way that might mean SOCKS users in non-TOR environments might hit problems with this patch? [These are the kinds of questions that can help drivers understand potential risk of accepting this for FF4.]
No idea how many people may have compiled their own firefox with this patch. I'm pretty sure we began shipping experimental Linux Tor Browser bundles with it enabled though.

As far if Tor is special, it uses SOCKS4A/SOCKS5, so it would be sending hostnames instead of IP addresses. But this exercises the more complicated parts of the code. A quick test with an SSH socks proxy should cover the rest. And then there is Chris's SOCKS unit tests that are attached that simulate various functionality.

chrisd, anything to add about how well your tests and Tor vs plain Socks4 (ie SSH) socks proxy use exercise the code?
The unit tests cover most of the SOCKS request types including IP and hostname combinations, so most possible uses should be accounted for (both Tor and non-Tor). The only request type that hasn't gotten much coverage is SOCKS5/IPv6.
Hi, thanks for reviewing, and sorry it's taken me long to respond. This updated patch should hopefully address most of what was suggested in previous comments. 

The SOCKS layer now uses a pre-allocated buffer instead of two nsCStrings. This seems to work pretty well. The Push*/Pop* methods have been renamed to Read*/Write*, with the hope that this will make things clearer. Leaving these buffer access methods around in some form is at least handy for the debugging assertions, and having a file-like api is nicer than using direct addressing everywhere, IMO. However, we can go with direct addressing if it's really preferable. I don't know if making the methods macros would give us any performance benefit, or if tuning the SOCKS handshake for performance really matters in general. SOCKS is out of the way pretty quickly and not much data is handled. The other debugging assertions are left in; they shouldn't get in the way, and better to be safe than sorry, etc.

The HandleSocks* methods for transferring SOCKS data have been split apart for reading and writing, and the states have been updated as well. The complicated code to read the variable-length SOCKS5 connection response has been split, also.

The call to Resolve() has been kept in for now, because going with AsyncResolve might be a bit difficult to get right. Since we're typically looking up the same name over and over, we should be able to take advantage of the DNS cache(?), so hopefully it would only spend a trival amount of time blocking in most cases. But, if we really should go with AsyncResolve, I'll try to find a way to get it in.

Also, I'm uncertain what to do with the code for propagating SOCKS errors using PR_INVALID_STATE_ERROR. Can this part stay, or should it be removed for now in lieu of a better strategy? There seems to be a fair amount of C code between the SOCKS layer and the socket transport, so I don't see how C++ exceptions could be used to report errors.
Attachment #450584 - Attachment is obsolete: true
Attachment #450585 - Attachment is obsolete: true
Now that the patches have been cleaned up, can we find someone to review (and hopefully approve) them asap?
(In reply to comment #70)
> can we find someone to review (and hopefully approve) them asap?
Can Michael and Ted do that? Or are they busy with something else?
Where are the reviewers?
(In reply to comment #71)
> Can Michael and Ted do that? Or are they busy with something else?

I'll do the review ASAP.
(In reply to comment #68)
> The SOCKS layer now uses a pre-allocated buffer instead of two nsCStrings. This
> seems to work pretty well. The Push*/Pop* methods have been renamed to
> Read*/Write*, with the hope that this will make things clearer. Leaving these
> buffer access methods around in some form is at least handy for the debugging
> assertions, and having a file-like api is nicer than using direct addressing
> everywhere, IMO. However, we can go with direct addressing if it's really
> preferable. I don't know if making the methods macros would give us any
> performance benefit, or if tuning the SOCKS handshake for performance really

Maybe they could be inlined.


> The call to Resolve() has been kept in for now, because going with AsyncResolve
> might be a bit difficult to get right. Since we're typically looking up the
> same name over and over, we should be able to take advantage of the DNS
> cache(?), so hopefully it would only spend a trival amount of time blocking in
> most cases. But, if we really should go with AsyncResolve, I'll try to find a
> way to get it in.

I think we can land it without AsyncResolve(). This isn't critical and can be fixed later.


> Also, I'm uncertain what to do with the code for propagating SOCKS errors using
> PR_INVALID_STATE_ERROR. Can this part stay, or should it be removed for now in
> lieu of a better strategy? There seems to be a fair amount of C code between
> the SOCKS layer and the socket transport, so I don't see how C++ exceptions
> could be used to report errors.

My fault, I should be more clear. I didn't mean throwing C++ exception. By "throw a real error" I just meant to return correct error via PR_SetError() instead of using PR_INVALID_STATE_ERROR. We could use PR_INVALID_STATE_ERROR directly to indicate that underlaying connect or connectcontinue failed and handle this state in nsSocketTransport2.cpp. Then there would be no need to call PR_ConnectContinue() again to find out the real error code.


> +   // A buffer of 1024 bytes is more than enough to hold a 255-byte hostname
> +   // alongside additional fields.
> +   static const PRUint32 BUFFER_SIZE = 1024;
> +   static const PRUint32 MAX_HOSTNAME_LEN = 255;

If I count correctly then the largest SOCKS4 request is 265 bytes and the largest SOCKS5 connect request/response is 262 bytes. So BUFFER_SIZE should be 265.


> +   case SOCKS5_WRITE_AUTH_REQUEST:
> +       if (WriteToSocket(fd) != PR_SUCCESS)
> +           return PR_FAILURE;
> +       WantRead(2);
> +       mState = SOCKS5_READ_AUTH_RESPONSE;
> +       return PR_SUCCESS;
> +   case SOCKS5_READ_AUTH_RESPONSE:

You could advance immediately to the next state instead of returning PR_SUCCESS.


> +   // Advance read offset to ignore version number
> +   mReadOffset = 1;

Shouldn't we check the version? (Also on other places where it is skipped.)


> +nsSOCKSSocketInfo::WriteString(const nsCString &str)

Use nsACString


> +   if (mDestinationHost.Length() > MAX_HOSTNAME_LEN) {
> +       LOGERROR(("socks4: destination host name is too long!"));
> +       HandshakeFinished(PR_BAD_ADDRESS_ERROR);
> +   }

"return PR_FAILURE;" is missing here


> +    switch (*type) {
> +        case 0x01: // ipv4
> +            *len = 4 - 1;
> +            break;
> +        case 0x04: // ipv6
> +            *len = 16 - 1;
> +            break;
> +        case 0x03: // fqdn
> +            *len = ReadUint8();
> +            break;
> +    }

We need to fail in case of an unknown type.


> +   NS_ABORT_IF_FALSE(mVersion == 5,
> +                     "SOCKS version must be 4 or 5!");

Incorrect message.


> +   NS_ABORT_IF_FALSE(mDataLength == 5,
> +                     "SOCKS 5 connection reply must be at least 5 bytes!");

Incorrect message.


> +   NS_ABORT_IF_FALSE(mDataLength > 5,
> +                     "SOCKS 5 connection reply must be at least 5 bytes!");

Again incorrect message in ReadV5ConnectResponseBottom(). And I think it would make more sense to move it after the ReadV5AddrTypeAndLength() and check if mDataLength == (7+len).


> +   // The port is the last field in the response. If data remains, something's
> +   // off.
> +   LOGDEBUG(("socks5: remaining data %u",
> +           unsigned(mDataLength - mReadOffset)));

Is there some reason to allow this? We are asking for some exact amount and we should get it.


> +   do {
> +       status = info->DoHandshake(fd, oflags);
> +   } while (status == PR_SUCCESS && !info->IsConnected());

No timeout is passed to DoHandShake() so PR_INTERVAL_NO_TIMEOUT is used, which is wrong. We should remember the timeout passed to nsSOCKSIOLayerConnect(). E.g. set the timeout via new method info->SetConnectTimeout(to) before the do-while loop. Anyway, to handle the timeout correctly we should limit the whole process (connecting + handshake), but now we limit just the connecting, right?


> +   if (!info->IsConnected()) {
> +       *out_flags = 0;
> +       //LOGDEBUG(("socks: polling %hd", info->GetPollFlags()));
> +       return info->GetPollFlags();
> +   }

Remove the commented out LOGDEBUG().
Michal - It looks like Chris is MIA again. I think he may be getting weary of the back-and-forth on this.. I believe he's also a student, so his time is probably available only randomly and periodically depending on classes.

Since this is your second review, and some of your comments look either new this time, trivial, and/or still a little unclear, can you possibly take care of them?

Otherwise, I can try to go through and fix them myself sometime this week..
I'll update the patch.
Assignee: chrisd.mang → michal.novotny
Attached patch updated patch (obsolete) — Splinter Review
Attachment #494975 - Attachment is obsolete: true
Attachment #510446 - Flags: review?(bzbarsky)
I'm probabably the wrong reviewer for this: I don't know anything about SOCKS proxies or really this part of the necko code.  If there's absolutely no one else to do the review I can try to learn this code well enough to review, but that will take time.  :(

That said, I thought Michal reviewed this already.  So what part am I supposed to review and why?
(In reply to comment #78)
> That said, I thought Michal reviewed this already.  So what part am I supposed
> to review and why?

I think someone else should look at the changes between attachment 494975 [details] [diff] [review] and 510446. Otherwise, I would in fact review my own code.
OK, can you post an interdiff between those two attachments?  That would make it a lot easier to see what was changed...
Attached patch interdiff 494975 - 510446 (obsolete) — Splinter Review
Comment on attachment 510446 [details] [diff] [review]
updated patch

>+        // A SOCKS request was rejected, get the actual error code from
>+        // the OS error

s/,/;/

The fallthrough in nsSOCKSSocketInfo::DoHandshake looks purposeful, but why do we want that?  I mean the fallthrough when going from the various REQUEST states to the corresponding RESPONSE states.  If we didn't do that, would the caller just end up calling right back into DoHandshake, so this is basically an optimization or something?  If so, why do we want to make the optimization?  Seems to make the code more confusing to me....

r=me with the typo fixed and the DoHandshake stuff explained.  And thank you for posting the interdiff!  It was very useful.
Attachment #510446 - Flags: review?(bzbarsky) → review+
And to be clear, I only read the interdiff and the comments on the bug....
(In reply to comment #82)
> The fallthrough in nsSOCKSSocketInfo::DoHandshake looks purposeful, but why do
> we want that?  I mean the fallthrough when going from the various REQUEST
> states to the corresponding RESPONSE states.  If we didn't do that, would the
> caller just end up calling right back into DoHandshake, so this is basically an
> optimization or something?  If so, why do we want to make the optimization? 

You understand it correctly. It is just an optimization to not to go back to nsSOCKSIOLayerConnect() or nsSOCKSIOLayerConnectContinue() and pass to the RESPONSE state directly. If we didn't do that the while loop in nsSOCKSIOLayerConnect() or nsSOCKSIOLayerConnectContinue() would call DoHandshake() again immediately. I can remove it if you think that it makes the code hard to read.
Unless we hit this code a whole lot, I'd rather not do the optimization, so the control flow is clearer, yeah.

Of course if we have profile data showing this matters we should keep the optimization!
Attachment #510446 - Attachment is obsolete: true
Attachment #512930 - Attachment is obsolete: true
Attachment #514770 - Flags: approval2.0?
Comment on attachment 514770 [details] [diff] [review]
removed optimization in DoHandshake()

I want this to land as soon as possible, but *not* before 2.0. It's just too late now, and the reward isn't high enough IMO.
Attachment #514770 - Flags: approval2.0? → approval2.0-
Depends on: post2.0
Attached patch updated patchSplinter Review
> >+        // A SOCKS request was rejected, get the actual error code from
> >+        // the OS error
> 
> s/,/;/

I forgot to do this change in the previous patch.
Attachment #514770 - Attachment is obsolete: true
Sorry, I backed this out of cedar as test_socks.js was failing on NT.

http://hg.mozilla.org/projects/cedar/rev/f17c1a0a7ca3

e.g.
http://tinderbox.mozilla.org/showlog.cgi?log=Cedar/1300948416.1300950704.20889.gz#err0

server: using test case socks4:50743:127.0.0.1:8000:local
server: using test case socks4:50743:12345.xxx:8001:remote
server: using test case socks:50743:127.0.0.1:8002:local
server: using test case socks:50743:abcdefg.xxx:8003:remote
TEST-UNEXPECTED-FAIL | c:/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_socks.js | false == true - See following stack:
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: do_throw :: line 439
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: do_check_eq :: line 491
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: do_check_true :: line 503
JS frame :: c:/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_socks.js :: runScriptSubprocess :: line 44
JS frame :: c:/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_socks.js :: <TOP_LEVEL> :: line 431
JS frame :: c:/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_socks.js :: run_test :: line 483
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: _execute_test :: line 322
Whiteboard: fixed-in-cedar
No longer depends on: post2.0
Whiteboard: not-ready
(In reply to comment #90)

> JS frame ::
> c:/talos-slave/test/build/xpcshell/tests/netwerk/test/unit/test_socks.js ::
> runScriptSubprocess :: line 44

Michal, are you planning to investigate this? (Just checking since this bug has had lurchy progress in the past.)

From a quick skim it seems the test is having problems finding xpcshell.exe. Not sure offhand why that would be.
Possibly dumb question: is the patch from Michal included in the FF 4.0 release?
No.  This patch is not included anywhere yet, since it failed tests when it was checked in.
Dumb question, in the patch "SOCKS xpshell test with automatic port discovery":

+  bin.append("xpcshell");
+  if (!bin.exists()) {
+    bin.leafName = "xpshell.exe";
+    do_check_true(bin.exists());
+    if (!bin.exists())
+      do_throw("Can't find xpshell binary");

Shouldn't it be xpcshell.exe instead of xpshell.exe?
Uh, yeah, good catch. Somehow we missed that.
For what it's worth, I pushed this patch with the correction from comment 94 to try for a Windows xpcshell run; if that passes, I'll land it.
Yeah, that went green.  I pushed to cedar:

  http://hg.mozilla.org/mozilla-central/rev/31dce5e73ca4
  http://hg.mozilla.org/mozilla-central/rev/1fe0a6fa91ea
Flags: in-testsuite+
Whiteboard: not-ready → fixed-in-cedar
http://hg.mozilla.org/mozilla-central/rev/31dce5e73ca4
http://hg.mozilla.org/mozilla-central/rev/1fe0a6fa91ea
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: fixed-in-cedar
Target Milestone: --- → mozilla2.2
Which branch is cedar? If this isn't making it into a 4.0 dot release, the Tor Project is going to have to fork Firefox and remove Torbutton from addons.mozilla.org, and encourage our users to only use our Tor Browser Bundle instead.

The reason for this is that the existing HTTP to SOCKS conversion proxies we need to work around this SOCKS issue (Polipo/Privoxy) are unmaintained and have a number of serious stability and security bugs that make it simpler for us to use our own Firefox builds instead of maintaining the same fork for Polipo/Privoxy...
> Which branch is cedar?

A temporary branch being used for landing patches right now.

The important part is comment 98, where the patch was pushed to mozilla-central.

I'm afraid that this fix is not going to be in a 4.0 dot release unless our dot-release criteria have changed radically.  On the other hand, it's not entirely clear whether there will be 4.0 dot releases yet.

This fix _will_ be in Firefox 5, which should be shipping in June.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: