Open Bug 617723 Opened 12 years ago Updated 4 months ago

tstclnt fails to connect to fe80::1%lo0

Categories

(NSS :: Test, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

People

(Reporter: KaiE, Unassigned)

References

(Depends on 1 open bug)

Details

Attachments

(3 files, 2 obsolete files)

I'm filing this initially against nss/test,
although we might conclude it's a problem in nspr.

I'm trying to run the NSS test suite on a Mac OSX Intel machine.

At some point the test suite starts "selfserv" and tries to connect to it using "tstclnt".

The hostname of that machine is: macmini-intel.local

tstclnt says:
tstclnt: connecting to macmini-intel.local:8111 (address=fe80::1%lo0) 

Using selfserv with "-v" option, I see the connection never succeeds.
Eventually tstclnt gives up with a timeout error message.

The hostname macmini-intel.local is valid on that machine.
I can use "ping macmini-intel.local" and "telnet macmini-intel.local 25" sucessfully.

However, "ping fe80::1%lo0" fails with "unknown host".


This means, tstclnt (or nspr) resolves the hostname "macmini-intel.local" to a network address that doesn't work.
(In reply to comment #0)
> However, "ping fe80::1%lo0" fails with "unknown host".

You should use "ping6" instead (this will result in request timeout errors, but not in a name resolution failure).
Kai, what is the exact timeout error message when
tstclnt gives up eventually?

Can you edit mozilla/nsprpub/pr/src/misc/prnetdb.c,
and go to the following lines:

2041         memset(&hints, 0, sizeof(hints));
2042         hints.ai_flags = (flags & PR_AI_NOCANONNAME) ? 0: AI_CANONNAME;
2043         hints.ai_family = (af == PR_AF_INET) ? AF_INET : AF_UNSPEC;

After line 2043, add

             hints.ai_flags |= AI_ADDRCONFIG;

Does that help?

Also, please edit tstclnt.c, and go to these lines:

688         do {
689             enumPtr = PR_EnumerateAddrInfo(enumPtr, addrInfo, portno, &addr);
690         } while (enumPtr != NULL &&
691                  addr.raw.family != PR_AF_INET &&
692                  addr.raw.family != PR_AF_INET6);

After line 689, add:

                if (enumPtr != NULL)
                    printHostNameAndAddr(host, &addr);

What messages of the kind
  tstclnt: connecting to macmini-intel.local:8111 (address=xxxx) 
does tstclnt say?  I'm only interested in the xxx part in the
parentheses.

Thanks a lot for your help.
Removed the entry from /etc/hosts to restored the original config.

Running
DYLD_LIBRARY_PATH=../lib/ ./selfserv -D -p 8573 -d /Users/kengert/tcp-bug/server -n macmini-intel.local -e macmini-intel.local-ec -w nss -r -v

and
DYLD_LIBRARY_PATH=../lib/ ./tstclnt -p 8573 -h macmini-intel.local  -q -d /Users/kengert/tcp-bug/client -v < /Users/kengert/tcp-bug/sslreq.dat

where sslreq.dat contains
GET / HTTP/1.0
(+extra linefeed)


(In reply to comment #2)
> Kai, what is the exact timeout error message when
> tstclnt gives up eventually?

tstclnt: connecting to macmini-intel.local:8573 (address=fe80::1%lo0)
tstclnt: Client timed out while waiting for connection to server: Connection refused by peer.


> Can you edit mozilla/nsprpub/pr/src/misc/prnetdb.c,
> and go to the following lines:
> 
> 2041         memset(&hints, 0, sizeof(hints));
> 2042         hints.ai_flags = (flags & PR_AI_NOCANONNAME) ? 0: AI_CANONNAME;
> 2043         hints.ai_family = (af == PR_AF_INET) ? AF_INET : AF_UNSPEC;
> 
> After line 2043, add
> 
>              hints.ai_flags |= AI_ADDRCONFIG;
> 
> Does that help?

No, same output.

tstclnt: connecting to macmini-intel.local:8573 (address=fe80::1%lo0)
tstclnt: Client timed out while waiting for connection to server: Connection refused by peer.


> After line 689, add:
>                 if (enumPtr != NULL)
>                     printHostNameAndAddr(host, &addr);

This duplicates the connection messages, now I get:

tstclnt: connecting to macmini-intel.local:8573 (address=fe80::1%lo0)
tstclnt: connecting to macmini-intel.local:8573 (address=fe80::1%lo0)
tstclnt: Client timed out while waiting for connection to server: Connection refused by peer.
Attached patch test patchSplinter Review
For reference I attach the patch used for the previous test.
Firefox 4.0/4.x blocker bug 614526 seems to be related.
In my understanding, the problem is that

  selfserv binds on ipv4 only, but not on ipv6

  tstclnt looks up a ipv6 ip address, and connects using an ipv6 socket
Attached patch Patch v2 to selfserv (obsolete) — Splinter Review
As said before, I believe the problem is not with tstclnt. The problem is that tstclnt attempts to a hostname that resolves to an IPv6 address, and nobody listens on a IPv6 socket.

I think selfserv should be enhanced, it should be able to listen on a IPv6 socket.

I'm attaching a patch that adds a new parameter to selfserv.
The parameter -I is used to pass a hostname to selfserv.
Selfserv will use a similar test as found in tstclnt.

If selfserv learns the given hostname resolves to an IPv6 address, then selfserv will listen on an IPv6 socket.


Using this patch to selfserv, 
with the test scenario reported in this bug, 
using tstclnt without modifications,
the connection to fe80::1%lo0 succeeds.


I have two proposals:

(a)
- use this patch v2
- in addition, change the NSS test suite, and whenever we start a selfserv,
  use the new additional -I <bind-host> parameter

(b)
- enhance this patch v2 to automatically decide about IPv6 vs. IPv4 
  based on the hostname found in the environment variables


My preference is (a), because currently the HOSTNAME parameter is used in the test suite scripts, but never accessed from NSS cmd C code.
Attachment #499015 - Flags: review?(wtc)
I applied the attached patch to the test scripts together with Kai's patch and all tests passed on Fedora 14 and connections worked, all tests suites passed. Testesd with ipv6 both on and off.
Attachment #499313 - Flags: review?(wtc)
My test invocation that enables all test suites and sets HOSTNAME
export HOSTNAME=localhost.localdomain
HOST=localhost DOMSUF=localdomain PORT=$MYRAND NSS_CYCLES="" NSS_TESTS="" NSS_SSL_TESTS="" NSS_SSL_RUN="" ./all.sh
For UNIX/Liniz could also say export HOSTNAME=`uname -n`
Both patches combined give me a successful build and test run on the original Mac OSX machine, too.
This patch combines Elio's patch and my patch.

I've removed a debug-printf that was left in.
Attachment #499015 - Attachment is obsolete: true
Attachment #499313 - Attachment is obsolete: true
Attachment #499383 - Flags: review?(wtc)
Attachment #499015 - Flags: review?(wtc)
Attachment #499313 - Flags: review?(wtc)
(In reply to comment #6)
Kai Engert wrote:
> In my understanding, the problem is that
> 
>   selfserv binds on ipv4 only, but not on ipv6
> 
>   tstclnt looks up a ipv6 ip address, and connects using an ipv6 socket

Thank you for tracking down this bug.  You are right on.
You rediscovered a know bug of selfserv: bug 388117.
(Bug 366614 is the tracking bug on IPv6 support in NSS.)

The best fix is to finish Nelson's work on bug 388117,
which will allow selfserv to listen on both IPv4 and IPv6.
For some unknown reason, Nelson's patch caused test failures
on some Tinderboxes and was backed out (see bug 388117
comment 26).  See also the IP-version agnostic sample server at
http://msdn.microsoft.com/en-us/library/ms738639%28v=VS.85%29.aspx

As a short-term fix for Linux (NSS package self test),
selfserv can open a dual-stack AF_INET6 listening socket,
which can serve both IPv4 and IPv6 clients.
(In reply to comment #12)
This bug is now blocking testing for RHEL 6 where we are required to enable all test test suites as part of the QE acceptance.
> 
> As a short-term fix for Linux (NSS package self test),
> selfserv can open a dual-stack AF_INET6 listening socket,
> which can serve both IPv4 and IPv6 clients.

Wan-Teh, could you post a patch to selfserv with this short-term fix? I will gladly pick it it up. Thanks in advance.
Elio: this patch should be allow you to run NSS all.sh when you
build Fedora's NSS package.  Please test it on Linux.

It changes selfserv to use a dual-stack IPv6 listening socket, which
can accept connections from both IPv4 and IPv6 clients.  NSPR's
IPv6 sockets have the IPV6_V6ONLY socket option default to false.
Attachment #499383 - Flags: review?(wtc) → review-
Comment on attachment 514702 [details] [diff] [review]
Patch for Fedora's NSS package (DO NOT CHECK IN)

Kai, if you still have that Mac, please test this patch on it.
The patch should also solve the problem on your Mac.  Thanks.
It works for Fedora (Rawhide/F15/F14) and RHEL-6.
The proper fix for selfserv, using the approach described in
http://msdn.microsoft.com/en-us/library/ms738639%28v=VS.85%29.aspx ,
requires the new NSPR wrapper function for getaddrinfo proposed
in bug 636504 to determine whether selfserv needs to open an
IPv4 or pure-IPv6 (not dual-stack) listening socket, or both.
Depends on: 636504
(1)
Wan-Teh, I still don't understand which "works always" strategy you're proposing, ever after I have attempted to implement bug 636504.

(2)
Your earlier argument in this bug was, it should not be necessary to pass an argument to selfserv, in order to make things work.
I personally think your request unnecessarily makes life much harder for us.

We already pass the hostname to tstclnt, and this is what the tstclnt will connect to.

But on the other hand, you require that selfserv will be smart enough to guess on its own where to listen.

I would prefer to avoid additional and take the patch that I have made, because it works, because it gives equal information to both selfserv and tstclnt.


After having attempted to implement bug 636504 (which might not be what you had in mind), I learn it still doesn't work for selfserv.

Well, maybe my patch is wrong. But with the patch, selfserv still listens on a ipv4 server socket, while the hostname that tstclnt is using "macmini-intel.local" still is resolved by nspr to an ipv6 target address.


Sorry, I have no idea how to solve this bug.

I guess it could be solved by using a socket that listens on both stacks. But then we need code that reliably tells us when we should do that, when it's safe or not, and how to do it. I don't know if the approach to listen on both stacks is compatible with all platforms we have to support.


Sorry, I believe I have spent a lot of time on this already, and have even provided a fully working solution, and I still don't see clear how exactly this should be solved to make you comfortable with the solution.
(In reply to comment #18)
> (2)
> Your earlier argument in this bug was, it should not be necessary to pass an
> argument to selfserv, in order to make things work.
> I personally think your request unnecessarily makes life much harder for us.

It would be useful to at least be *able* to pass an argument to selfserv to force it to be IPv4-only or IPv6-only, for (ad-hoc) testing.

> I guess it could be solved by using a socket that listens on both stacks.
> But then we need code that reliably tells us when we should do that,
> when it's safe or not, and how to do it. I don't know if the approach
> to listen on both stacks is compatible with all platforms we have
> to support.

AFAICT: If the target platform is Windows and if GetVersionEx() returns 5.0 (or 5.2?) then you need to try to open two (one ipv4, one ipv6) but either one may fail. Otherwise you can use one as in attachment 514702 [details] [diff] [review].
This problem persists even in the latest version . The tstclnt works fine for IPV4 but fails for IPV6 giving selfserv aonnectio time out error and the above mentioned code of tstclnt has undergone a huge number of changes so i am not able to use the patches mentioned above .Kindly help , i am attaching the relevant logs of test failure.

Platform:CentOS release 6.5 

This is what i got on running the NSS testsuite.
.......................................................................................................................................

selfserv -D -p 8443 -d ../server -n localhost6.localdomain6  \
         -e localhost6.localdomain6-ec -w nss -r -i ../tests_pid.17821  &
trying to connect to selfserv at Tue Jun  3 15:03:24 IST 2014
tstclnt -p 8443 -h localhost6.localdomain6  -q \
        -d ../client -v < /home/nitin/Sachin_NSS/NSS-IPV61/nss-3.16-with-nspr-4.10.4/nss-3.16/nss/tests/ssl/sslreq.dat
tstclnt: connecting to localhost6.localdomain6:8443 (address=::1)
tstclnt: Client timed out while waiting for connection to server: PR_CONNECT_RESET_ERROR: TCP connection reset by peer
retrying to connect to selfserv at Tue Jun  3 15:04:30 IST 2014
tstclnt -p 8443 -h localhost6.localdomain6  -q \
        -d ../client -v < /home/nitin/Sachin_NSS/NSS-IPV61/nss-3.16-with-nspr-4.10.4/nss-3.16/nss/tests/ssl/sslreq.dat
tstclnt: connecting to localhost6.localdomain6:8443 (address=::1)
tstclnt: Client timed out while waiting for connection to server: PR_CONNECT_RESET_ERROR: TCP connection reset by peer
ssl.sh: #727: Waiting for Server - FAILED

..........................................................................................................................................
Flags: needinfo?(kaie)
Sachin, I have explained the situation from my point of view in comment 18, there is nothing else I can add. It needs a proposal for a compromise from another NSS peer to make progress.
Flags: needinfo?(kaie)

So we've been carrying wtc patch in our own code now from 8 years. we'd like to fold something upstream. There are 3 possibilities:

  1. fold the code with ifdef LINUX (and possibly MacOS).
  2. advance the patch Kai recommends in bug 6365004.
  3. build a full dual socket implementation for selfserv (a. la. windows).

I'd prefer 1 because I know I can make that work. Two is acceptable. Three may be ideal, but more work with less real benefit.
I'm soliciting input here. I'll proceed with 1 if I hear none.
bob

(In reply to Robert Relyea from comment #22)

So we've been carrying wtc patch in our own code now from 8 years.

Which patch did you carry?
Is it attachment "combined patch v3" which wtc has marked r- ?

No, the one wtc marked 'don't check in, for Fedora only'. It also has some code to deal with the case where the OS has enabled IPV6, but the address is still IPV4 (in which case NSPR uses the IPV6 emulator socket, which you can't set as inheritable). The curren't patch checks if the socket is pure NSPR or not and drop back to IPV4 if it's not.

bob

You need to log in before you can comment on or make changes to this bug.