Last Comment Bug 617723 - tstclnt fails to connect to fe80::1%lo0
: tstclnt fails to connect to fe80::1%lo0
Status: NEW
:
Product: NSS
Classification: Components
Component: Test (show other bugs)
: trunk
: x86 Mac OS X
: -- normal (vote)
: ---
Assigned To: nobody
:
Mentors:
Depends on: 636504
Blocks:
  Show dependency treegraph
 
Reported: 2010-12-08 13:33 PST by Kai Engert (:kaie)
Modified: 2014-08-12 12:03 PDT (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
test patch (1.78 KB, patch)
2010-12-16 10:57 PST, Kai Engert (:kaie)
no flags Details | Diff | Review
Patch v2 to selfserv (8.52 KB, patch)
2010-12-21 06:48 PST, Kai Engert (:kaie)
no flags Details | Diff | Review
Complementary patch for ssl test scripts (4.06 KB, patch)
2010-12-22 09:28 PST, Elio Maldonado
no flags Details | Diff | Review
combined patch v3 (16.83 KB, patch)
2010-12-22 14:18 PST, Kai Engert (:kaie)
wtc: review-
Details | Diff | Review
Patch for Fedora's NSS package (DO NOT CHECK IN) (1.20 KB, patch)
2011-02-23 18:38 PST, Wan-Teh Chang
no flags Details | Diff | Review

Description Kai Engert (:kaie) 2010-12-08 13:33:23 PST
I'm filing this initially against nss/test,
although we might conclude it's a problem in nspr.

I'm trying to run the NSS test suite on a Mac OSX Intel machine.

At some point the test suite starts "selfserv" and tries to connect to it using "tstclnt".

The hostname of that machine is: macmini-intel.local

tstclnt says:
tstclnt: connecting to macmini-intel.local:8111 (address=fe80::1%lo0) 

Using selfserv with "-v" option, I see the connection never succeeds.
Eventually tstclnt gives up with a timeout error message.

The hostname macmini-intel.local is valid on that machine.
I can use "ping macmini-intel.local" and "telnet macmini-intel.local 25" sucessfully.

However, "ping fe80::1%lo0" fails with "unknown host".


This means, tstclnt (or nspr) resolves the hostname "macmini-intel.local" to a network address that doesn't work.
Comment 1 Kaspar Brand 2010-12-09 00:10:34 PST
(In reply to comment #0)
> However, "ping fe80::1%lo0" fails with "unknown host".

You should use "ping6" instead (this will result in request timeout errors, but not in a name resolution failure).
Comment 2 Wan-Teh Chang 2010-12-09 10:50:40 PST
Kai, what is the exact timeout error message when
tstclnt gives up eventually?

Can you edit mozilla/nsprpub/pr/src/misc/prnetdb.c,
and go to the following lines:

2041         memset(&hints, 0, sizeof(hints));
2042         hints.ai_flags = (flags & PR_AI_NOCANONNAME) ? 0: AI_CANONNAME;
2043         hints.ai_family = (af == PR_AF_INET) ? AF_INET : AF_UNSPEC;

After line 2043, add

             hints.ai_flags |= AI_ADDRCONFIG;

Does that help?

Also, please edit tstclnt.c, and go to these lines:

688         do {
689             enumPtr = PR_EnumerateAddrInfo(enumPtr, addrInfo, portno, &addr);
690         } while (enumPtr != NULL &&
691                  addr.raw.family != PR_AF_INET &&
692                  addr.raw.family != PR_AF_INET6);

After line 689, add:

                if (enumPtr != NULL)
                    printHostNameAndAddr(host, &addr);

What messages of the kind
  tstclnt: connecting to macmini-intel.local:8111 (address=xxxx) 
does tstclnt say?  I'm only interested in the xxx part in the
parentheses.

Thanks a lot for your help.
Comment 3 Kai Engert (:kaie) 2010-12-16 10:54:47 PST
Removed the entry from /etc/hosts to restored the original config.

Running
DYLD_LIBRARY_PATH=../lib/ ./selfserv -D -p 8573 -d /Users/kengert/tcp-bug/server -n macmini-intel.local -e macmini-intel.local-ec -w nss -r -v

and
DYLD_LIBRARY_PATH=../lib/ ./tstclnt -p 8573 -h macmini-intel.local  -q -d /Users/kengert/tcp-bug/client -v < /Users/kengert/tcp-bug/sslreq.dat

where sslreq.dat contains
GET / HTTP/1.0
(+extra linefeed)


(In reply to comment #2)
> Kai, what is the exact timeout error message when
> tstclnt gives up eventually?

tstclnt: connecting to macmini-intel.local:8573 (address=fe80::1%lo0)
tstclnt: Client timed out while waiting for connection to server: Connection refused by peer.


> Can you edit mozilla/nsprpub/pr/src/misc/prnetdb.c,
> and go to the following lines:
> 
> 2041         memset(&hints, 0, sizeof(hints));
> 2042         hints.ai_flags = (flags & PR_AI_NOCANONNAME) ? 0: AI_CANONNAME;
> 2043         hints.ai_family = (af == PR_AF_INET) ? AF_INET : AF_UNSPEC;
> 
> After line 2043, add
> 
>              hints.ai_flags |= AI_ADDRCONFIG;
> 
> Does that help?

No, same output.

tstclnt: connecting to macmini-intel.local:8573 (address=fe80::1%lo0)
tstclnt: Client timed out while waiting for connection to server: Connection refused by peer.


> After line 689, add:
>                 if (enumPtr != NULL)
>                     printHostNameAndAddr(host, &addr);

This duplicates the connection messages, now I get:

tstclnt: connecting to macmini-intel.local:8573 (address=fe80::1%lo0)
tstclnt: connecting to macmini-intel.local:8573 (address=fe80::1%lo0)
tstclnt: Client timed out while waiting for connection to server: Connection refused by peer.
Comment 4 Kai Engert (:kaie) 2010-12-16 10:57:01 PST
Created attachment 498146 [details] [diff] [review]
test patch

For reference I attach the patch used for the previous test.
Comment 5 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2010-12-16 15:41:09 PST
Firefox 4.0/4.x blocker bug 614526 seems to be related.
Comment 6 Kai Engert (:kaie) 2010-12-21 05:15:20 PST
In my understanding, the problem is that

  selfserv binds on ipv4 only, but not on ipv6

  tstclnt looks up a ipv6 ip address, and connects using an ipv6 socket
Comment 7 Kai Engert (:kaie) 2010-12-21 06:48:00 PST
Created attachment 499015 [details] [diff] [review]
Patch v2 to selfserv

As said before, I believe the problem is not with tstclnt. The problem is that tstclnt attempts to a hostname that resolves to an IPv6 address, and nobody listens on a IPv6 socket.

I think selfserv should be enhanced, it should be able to listen on a IPv6 socket.

I'm attaching a patch that adds a new parameter to selfserv.
The parameter -I is used to pass a hostname to selfserv.
Selfserv will use a similar test as found in tstclnt.

If selfserv learns the given hostname resolves to an IPv6 address, then selfserv will listen on an IPv6 socket.


Using this patch to selfserv, 
with the test scenario reported in this bug, 
using tstclnt without modifications,
the connection to fe80::1%lo0 succeeds.


I have two proposals:

(a)
- use this patch v2
- in addition, change the NSS test suite, and whenever we start a selfserv,
  use the new additional -I <bind-host> parameter

(b)
- enhance this patch v2 to automatically decide about IPv6 vs. IPv4 
  based on the hostname found in the environment variables


My preference is (a), because currently the HOSTNAME parameter is used in the test suite scripts, but never accessed from NSS cmd C code.
Comment 8 Elio Maldonado 2010-12-22 09:28:14 PST
Created attachment 499313 [details] [diff] [review]
Complementary patch for ssl test scripts

I applied the attached patch to the test scripts together with Kai's patch and all tests passed on Fedora 14 and connections worked, all tests suites passed. Testesd with ipv6 both on and off.
Comment 9 Elio Maldonado 2010-12-22 09:36:24 PST
My test invocation that enables all test suites and sets HOSTNAME
export HOSTNAME=localhost.localdomain
HOST=localhost DOMSUF=localdomain PORT=$MYRAND NSS_CYCLES="" NSS_TESTS="" NSS_SSL_TESTS="" NSS_SSL_RUN="" ./all.sh
For UNIX/Liniz could also say export HOSTNAME=`uname -n`
Comment 10 Kai Engert (:kaie) 2010-12-22 14:14:26 PST
Both patches combined give me a successful build and test run on the original Mac OSX machine, too.
Comment 11 Kai Engert (:kaie) 2010-12-22 14:18:55 PST
Created attachment 499383 [details] [diff] [review]
combined patch v3

This patch combines Elio's patch and my patch.

I've removed a debug-printf that was left in.
Comment 12 Wan-Teh Chang 2011-01-30 20:17:22 PST
(In reply to comment #6)
Kai Engert wrote:
> In my understanding, the problem is that
> 
>   selfserv binds on ipv4 only, but not on ipv6
> 
>   tstclnt looks up a ipv6 ip address, and connects using an ipv6 socket

Thank you for tracking down this bug.  You are right on.
You rediscovered a know bug of selfserv: bug 388117.
(Bug 366614 is the tracking bug on IPv6 support in NSS.)

The best fix is to finish Nelson's work on bug 388117,
which will allow selfserv to listen on both IPv4 and IPv6.
For some unknown reason, Nelson's patch caused test failures
on some Tinderboxes and was backed out (see bug 388117
comment 26).  See also the IP-version agnostic sample server at
http://msdn.microsoft.com/en-us/library/ms738639%28v=VS.85%29.aspx

As a short-term fix for Linux (NSS package self test),
selfserv can open a dual-stack AF_INET6 listening socket,
which can serve both IPv4 and IPv6 clients.
Comment 13 Elio Maldonado 2011-02-22 09:35:49 PST
(In reply to comment #12)
This bug is now blocking testing for RHEL 6 where we are required to enable all test test suites as part of the QE acceptance.
> 
> As a short-term fix for Linux (NSS package self test),
> selfserv can open a dual-stack AF_INET6 listening socket,
> which can serve both IPv4 and IPv6 clients.

Wan-Teh, could you post a patch to selfserv with this short-term fix? I will gladly pick it it up. Thanks in advance.
Comment 14 Wan-Teh Chang 2011-02-23 18:38:40 PST
Created attachment 514702 [details] [diff] [review]
Patch for Fedora's NSS package (DO NOT CHECK IN)

Elio: this patch should be allow you to run NSS all.sh when you
build Fedora's NSS package.  Please test it on Linux.

It changes selfserv to use a dual-stack IPv6 listening socket, which
can accept connections from both IPv4 and IPv6 clients.  NSPR's
IPv6 sockets have the IPV6_V6ONLY socket option default to false.
Comment 15 Wan-Teh Chang 2011-02-23 18:54:26 PST
Comment on attachment 514702 [details] [diff] [review]
Patch for Fedora's NSS package (DO NOT CHECK IN)

Kai, if you still have that Mac, please test this patch on it.
The patch should also solve the problem on your Mac.  Thanks.
Comment 16 Elio Maldonado 2011-02-24 06:43:04 PST
It works for Fedora (Rawhide/F15/F14) and RHEL-6.
Comment 17 Wan-Teh Chang 2011-02-24 10:45:17 PST
The proper fix for selfserv, using the approach described in
http://msdn.microsoft.com/en-us/library/ms738639%28v=VS.85%29.aspx ,
requires the new NSPR wrapper function for getaddrinfo proposed
in bug 636504 to determine whether selfserv needs to open an
IPv4 or pure-IPv6 (not dual-stack) listening socket, or both.
Comment 18 Kai Engert (:kaie) 2011-02-24 15:29:41 PST
(1)
Wan-Teh, I still don't understand which "works always" strategy you're proposing, ever after I have attempted to implement bug 636504.

(2)
Your earlier argument in this bug was, it should not be necessary to pass an argument to selfserv, in order to make things work.
I personally think your request unnecessarily makes life much harder for us.

We already pass the hostname to tstclnt, and this is what the tstclnt will connect to.

But on the other hand, you require that selfserv will be smart enough to guess on its own where to listen.

I would prefer to avoid additional and take the patch that I have made, because it works, because it gives equal information to both selfserv and tstclnt.


After having attempted to implement bug 636504 (which might not be what you had in mind), I learn it still doesn't work for selfserv.

Well, maybe my patch is wrong. But with the patch, selfserv still listens on a ipv4 server socket, while the hostname that tstclnt is using "macmini-intel.local" still is resolved by nspr to an ipv6 target address.


Sorry, I have no idea how to solve this bug.

I guess it could be solved by using a socket that listens on both stacks. But then we need code that reliably tells us when we should do that, when it's safe or not, and how to do it. I don't know if the approach to listen on both stacks is compatible with all platforms we have to support.


Sorry, I believe I have spent a lot of time on this already, and have even provided a fully working solution, and I still don't see clear how exactly this should be solved to make you comfortable with the solution.
Comment 19 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-02-24 16:01:30 PST
(In reply to comment #18)
> (2)
> Your earlier argument in this bug was, it should not be necessary to pass an
> argument to selfserv, in order to make things work.
> I personally think your request unnecessarily makes life much harder for us.

It would be useful to at least be *able* to pass an argument to selfserv to force it to be IPv4-only or IPv6-only, for (ad-hoc) testing.

> I guess it could be solved by using a socket that listens on both stacks.
> But then we need code that reliably tells us when we should do that,
> when it's safe or not, and how to do it. I don't know if the approach
> to listen on both stacks is compatible with all platforms we have
> to support.

AFAICT: If the target platform is Windows and if GetVersionEx() returns 5.0 (or 5.2?) then you need to try to open two (one ipv4, one ipv6) but either one may fail. Otherwise you can use one as in attachment 514702 [details] [diff] [review].
Comment 20 Sachin Kumar Gupta 2014-06-03 22:38:25 PDT
This problem persists even in the latest version . The tstclnt works fine for IPV4 but fails for IPV6 giving selfserv aonnectio time out error and the above mentioned code of tstclnt has undergone a huge number of changes so i am not able to use the patches mentioned above .Kindly help , i am attaching the relevant logs of test failure.

Platform:CentOS release 6.5 

This is what i got on running the NSS testsuite.
.......................................................................................................................................

selfserv -D -p 8443 -d ../server -n localhost6.localdomain6  \
         -e localhost6.localdomain6-ec -w nss -r -i ../tests_pid.17821  &
trying to connect to selfserv at Tue Jun  3 15:03:24 IST 2014
tstclnt -p 8443 -h localhost6.localdomain6  -q \
        -d ../client -v < /home/nitin/Sachin_NSS/NSS-IPV61/nss-3.16-with-nspr-4.10.4/nss-3.16/nss/tests/ssl/sslreq.dat
tstclnt: connecting to localhost6.localdomain6:8443 (address=::1)
tstclnt: Client timed out while waiting for connection to server: PR_CONNECT_RESET_ERROR: TCP connection reset by peer
retrying to connect to selfserv at Tue Jun  3 15:04:30 IST 2014
tstclnt -p 8443 -h localhost6.localdomain6  -q \
        -d ../client -v < /home/nitin/Sachin_NSS/NSS-IPV61/nss-3.16-with-nspr-4.10.4/nss-3.16/nss/tests/ssl/sslreq.dat
tstclnt: connecting to localhost6.localdomain6:8443 (address=::1)
tstclnt: Client timed out while waiting for connection to server: PR_CONNECT_RESET_ERROR: TCP connection reset by peer
ssl.sh: #727: Waiting for Server - FAILED

..........................................................................................................................................
Comment 21 Kai Engert (:kaie) 2014-08-12 12:03:44 PDT
Sachin, I have explained the situation from my point of view in comment 18, there is nothing else I can add. It needs a proposal for a compromise from another NSS peer to make progress.

Note You need to log in before you can comment on or make changes to this bug.