Open Bug 1912610 Opened 6 months ago Updated 12 hours ago

Misleading error page when lacking IPv6 while visiting IPv6-only website

Categories

(Core :: Networking, defect, P2)

Firefox 129
defect

Tracking

()

People

(Reporter: namedrelay, Assigned: sekim)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged][necko-priority-next])

Steps to reproduce:

I tried visiting a website and it gave me an error message.
The website doesn't work at home, which appears to be due to lack of IPv6 from my ISP. The website does work at a friends home, who has a different ISP.

Reproduce:

  1. Be on an IPv4-only network. (So https://test-ipv6.com/ should report a 0/10)
  2. Visit an IPv6-only website:
    2a. https://game.flyingpenguintech.org/
    2b. https://loopsofzen.uk/
    2c. https://clintonwhitehouse2.archives.gov/

Actual results:

Verbatim:
"Hmm. We’re having trouble finding that site.
We can’t connect to the server at game.flyingpenguintech.org.

If you entered the right address, you can:

  • Try again later
  • Check your network connection
  • Check that Firefox has permission to access the web (you might be connected but behind a firewall"

This error message is entirely wrong and misleading:

  1. I DID enter the right address. (my VPS has IPv6 and curl-ing the exact name returns an HTML page)
  2. Trying again later will NOT work, as the website is already online.
  3. There's nothing wrong with your internet connection. (WiFi works)
  4. Obviously, there is no firewall issue since other websites just work.

You will often see this page when mistyping an URL or when the website is just gone. Thus you will not realize that the fault lies with the ISP that's not providing an proper internet connection. People don't even realize there's a problem in the first place and will misleadingly think that the website no longer exist.

Expected results:

I would have expected Firefox to give a proper error page with realistic solutions.
At the very least there should be an indication that the website DOES exist, but that the user can't access it due to limitations of it's internet connection.

I would suggest something like this:
"You're limited by the technology of your time

This website does exists, but we cannot connect to it. If you want to visit this website, your internet connection requires "IPv6", something your ISP or network administrator should be providing. (But it appears they don't, or there is something wrong with your network.)

If you want to visit the site, you can do the following:

  • Use another network. (For example: roaming, campus/public network, ask your neighbour)
  • Ask your ISP or network administrator to enable IPv6.
  • Use a VPN or tunnel that provides IPv6.

I am aware of https://bugzilla.mozilla.org/show_bug.cgi?id=625710 , but it doesn't match the issue i am having.
(It matches on the technical side, but it's completely off on the UX issue and severity.)
If you wish to mark my bug as a duplicate, please do copy the bug description above for visibility.

Component: Untriaged → Networking
Product: Firefox → Core

Additionally to the 14 year old 625710, there's also a more recent 1681527 i just found.

They are both marked as enhancements, which made sense 14 years ago, but not today.
IPv4 is running on fumes right now, and an increasing amount of new websites are IPv6-only.

The lack of IPv6 deployment with ISP's will cause confusion with website owners/visitors as those sites will work at some places, but not at others.
I'd like to avoid situations like "My new website works at home and at wok, but why does it not work at my friend's place?"
These users deserve a proper explanation as to why the website doesn't work.

Additionally, this lack of deployment is probably happening because ISP's are not even aware it is causing trouble for their customers.
They don't get any user feedback on their lack of IPv6 because regular users aren't informed that the issue lies with their ISP in the first place.

Even i didn't realize this up to the moment i tried visiting my own IPv6-only website!
I wonder how many websites i unknowingly failed to visit due to this bug...

Thanks for the bug. We definitely have multiple of these reports. We're doing an overhaul of the error pages, so this could fit into that effort.

Blocks: necko-error
Severity: -- → S3
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P2
Whiteboard: [necko-triaged][necko-priority-review]

(In reply to Valentin Gosu [:valentin] (he/him) from comment #3)

Thanks for the bug. We definitely have multiple of these reports. We're doing an overhaul of the error pages, so this could fit into that effort.

You're welcome.

This issue has recently become more relevant due to several factors.
(The EU is increasing their "digital decade" efforts, several hosting providers are now making IPv4 optional due to costs, etc)
If not dealt with, it would probably have unexpected consequences sooner than later.

Anyways, what did you think of my suggested error message? :-)

Additionally, the tab title should probably be "Problem loading page" instead of the current "Server Not Found".
(Adding this before it's forgotten)

Whiteboard: [necko-triaged][necko-priority-review] → [necko-triaged]
Assignee: nobody → sekim
Whiteboard: [necko-triaged] → [necko-triaged][necko-priority-next]

What the status of this defect?
And is it likely that it will be fixed within a few months?

I am asking because i encountered someone asking about this very problem on Reddit today.
Link: https://www.reddit.com/r/firefox/comments/1i5qky2/why_does_a_browser_not_tell_if_a_website_is_only/

As this is defect has atypical symptoms, people who are affected don't realize there actually is a bug.
(But this does not mean there is no impact! There very much is, but it is simply not attributed to Firefox.)

(In reply to namedrelay from comment #6)

What the status of this defect?
And is it likely that it will be fixed within a few months?

I am asking because i encountered someone asking about this very problem on Reddit today.
Link: https://www.reddit.com/r/firefox/comments/1i5qky2/why_does_a_browser_not_tell_if_a_website_is_only/

As this is defect has atypical symptoms, people who are affected don't realize there actually is a bug.
(But this does not mean there is no impact! There very much is, but it is simply not attributed to Firefox.)

It is currently in the backlog, but can be reprioritized. I am planning to take a look soon - hopefully aiming to land it by 137-138 :)

Valentin, any thoughts/ideas on this bug? Seems like we need to introduce a new error code for this case of NS_ERROR_UNKNOWN_HOST and proceed with a different neterror message.

Flags: needinfo?(valentin.gosu)

(In reply to Sean Kim from comment #8)

Valentin, any thoughts/ideas on this bug? Seems like we need to introduce a new error code for this case of NS_ERROR_UNKNOWN_HOST and proceed with a different neterror message.

Not Valentin, but i do have some ideas/input.

As far as i know, Firefox currently checks for IPv6 connectivity and skips the AAAA lookups if there isn't any.
And this is causing the problem, because IPv6-only websites only have those, resulting in a "no records found" situation.
The first step in solving this would be to make the AAAA lookup happen regardless of whether there is IPv6 connectivity.
Then you should get an "Unable to connect" error instead of an "Server not found", which would be a major improvement.

Detecting an IP mismatch and showing a specific error message should happen later in the process.
This should be different error message than "unable to connect", as we KNOW what the issue is and can offer practical alternatives to the user.
I already listed the alternatives in the initial report. (the 3 bullet points at the bottom)

Keep in mind that there is also a scenario where the user doesn't have IPv4 connectivity.
(Especially in third world countries where even today a working CG-NAT is not guaranteed!)
It's the same kind of check and error page, so it should be very copy-pasteable.

(In reply to Sean Kim from comment #8)

Seems like we need to introduce a new error code for this case of NS_ERROR_UNKNOWN_HOST and proceed with a different neterror message.

Is it possible to detect IPv6 capability and IPv6-only website in the error page instead of the necko layer? The detection may be expensive. Do we have to detect them even if the user will never see the error (e.g. xhr/fetch access)?

(In reply to namedrelay from comment #9)

The first step in solving this would be to make the AAAA lookup happen regardless of whether there is IPv6 connectivity.

Do we have to make an AAAA query even before a host not found is returned? We can send an AAAA query in the error page and change the error message if the website is IPv6-only and the current network is IPv4-only (or vice versa). This way we can avoid performance penalty for most (non-error) access.

There are actually quite a lot of IPv6-only sites out there, a lot more than i expected. I wrote a script to resolve names based on publicly available DNS zonefiles (its only checking for domain and www.domain) and it's already found thousands of hosts which have AAAA records and no A:

https://www.ev6.net/v6sites.php

It's still running as the script is fairly slow, and the list of domains is massive.

Users will undoubtedly be hitting this and having no idea why these sites are failing to load.

AAAA lookup should be disabled if:

  • Firefox uses an HTTP/HTTPS proxy,
  • Firefox uses SOCKS v4 and "Proxy DNS when using SOCKS v4" is checked (SOCKS v4a), or
  • Firefox uses SOCKS v5 and "Proxy DNS when using SOCKS v5" is checked.

Because we can't tell if the proxy/socks server can access IPv6-only origin servers.

Also we should treat as if we are in an IPv4-only network if SOCKS v4 is used because SOCKS v4 does not support IPv6. In theory, SOCKS v4a server can access to IPv6-only origin servers if the URL uses a host name instead of an IPv6 address.

All DNS lookup should be disabled if using a proxy for DNS, not just AAAA aside for looking up the proxy itself. If you're in an environment where use of a proxy is required then DNS may not work at all, or may only resolve internal hosts.

Depending on environment, you might need an AAAA lookup to reach the proxy. If you blindly disable AAAA lookups because a proxy is configured, this would break.

Error messages should highlight that a proxy is in use and explain the potential side effects. In many cases errors will be generated by the proxy itself and not the browser.

Even when using a proxy, a connection test can be performed. For instance Windows attempts to download https://ipv6.msftconnecttest.com/connecttest.txt and uses this to determine whether it has IPv6 connectivity or not. This works just fine through a proxy. Firefox should really implement something similar to Windows whereby it tests connectivity like this and reports it to the user - possibly with a status icon and a link to an FAQ.

There are many scenarios where a user may end up on an IPv6-only network, which may or may not have access to legacy sites through a mechanism such as DNS64/NAT64. Outages are also common, and legacy networks are a lot more fragile due to all the kludges (nat, address conservation, lack of dhcp leases etc) so it's a frequent scenario that a user who normally has dual stack can experience an outage of legacy ip while v6 is still working.

If Firefox believes the user is on a network with only partial connectivity, it should inform/warn the user. Currently the user is not informed, and is left guessing why some sites fail to load with cryptic error messages.

The windows implementation of this is functional but not very user friendly. The user has to manually go to the network interface properties, and is expected to know what IPv* mean. A more user friendly approach would be preferable.

(In reply to bert from comment #13)

Depending on environment, you might need an AAAA lookup to reach the proxy. If you blindly disable AAAA lookups because a proxy is configured, this would break.

I mean we should disable the check that we are going to add in the error page. It does not affect any existing network connection because the query will be sent only if the connection failed and currently we do not send the AAAA query at all.

Even when using a proxy, a connection test can be performed. For instance Windows attempts to download https://ipv6.msftconnecttest.com/connecttest.txt and uses this to determine whether it has IPv6 connectivity or not. This works just fine through a proxy.

It can detect if the current network has IPv6 connectivity and Firefox is already doing a similar check (access to http://detectportal.firefox.com/success.txt?ipv6).

But it cannot detect if the origin server provides an IPv6 connection if proxy or socks is configured. We also have to check the latter.

The problem with detectportal.firefox.com is it's dual stack, so if you're accessing it through a proxy you have no way to tell which protocol was used.

The Microsoft approach uses two separate hostnames, both of which are single stack, so it can tell if one failed while the other succeeded.

(In reply to bert from comment #15)

The problem with detectportal.firefox.com is it's dual stack, so if you're accessing it through a proxy you have no way to tell which protocol was used.

It is possible to implement the IPv6 connectivity check on a single dual stack host. That said, the current http://detectportal.firefox.com/success.txt?ipv6 seems to be broken. It always returns success even in IPv4-only network.

Edit: I made a test implementation: https://emk.name/test/ipv6test.cgi

(In reply to Masatoshi Kimura [:emk] from comment #16)

It is possible to implement the IPv6 connectivity check on a single dual stack host. That said, the current http://detectportal.firefox.com/success.txt?ipv6 seems to be broken. It always returns success even in IPv4-only network.

Not really...
Currently it just seems to return a text file and ignore any parameters. On a dual stack or IPv6-only network it's always fetched over v6 and always returns success even when ?ipv4 is specified.

You could implement logic on the server side to parse the ?ipv6 parameter and compare it with the peer address accessing the server, but this would break in proxy setups as the client has no way to tell the proxy which protocol to use. The only way to influence what protocol the proxy used would be to use two different single-stack DNS names, as MS have done. This would also be a trivial change, as you'd just need to create two new DNS records.

The single host approach is also likely to break in NAT64 scenarios, as the client would be using an IPv6 socket to access a legacy host.

(In reply to bert from comment #17)

You could implement logic on the server side to parse the ?ipv6 parameter and compare it with the peer address accessing the server, but this would break in proxy setups as the client has no way to tell the proxy which protocol to use.

Why? "The peer address" is proxy's one in proxy setups. So it can tell the proxy which protocol to use.

(In reply to Masatoshi Kimura [:emk] from comment #18)

(In reply to bert from comment #17)

You could implement logic on the server side to parse the ?ipv6 parameter and compare it with the peer address accessing the server, but this would break in proxy setups as the client has no way to tell the proxy which protocol to use.

Why? "The peer address" is proxy's one in proxy setups. So it can tell the proxy which protocol to use.

No, the peer address would be whatever the proxy chose to use. If you have a dual stack proxy and a dual stack target site then it is always going to connect to that site using its preferred protocol (usually v6) unless the connection fails. The proxy performs the DNS lookup and decides what protocol to use. The client has no way to tell the proxy which protocol to use, and the server can only get the address of whichever protocol the proxy has chosen to use.

For instance here with a dual stack proxy requesting a dual stack site, the connection is ALWAYS made over v6. The client has no way to influence that, so the client has absolutely no idea if legacy sites are reachable through the proxy or not.

The only way to influence what protocol the proxy will use, is by requesting a single-stack URL.

(In reply to Masatoshi Kimura [:emk] from comment #16)

It is possible to implement the IPv6 connectivity check on a single dual stack host. That said, the current http://detectportal.firefox.com/success.txt?ipv6 seems to be broken. It always returns success even in IPv4-only network.

The server will always return 'success' regardless of the query parameters.
The way Firefox uses the endpoint is that it creates a channel to that URL and sets the NS_HTTP_DISABLE_IPV4 flag on the channel.
The ?ipv6 part is in the URL only to differentiate the channels in logging and wireshark.


This specific bug covers showing the Unable to connect error page when the website is IPv6 only, and the computer doesn't have IPv6 connectivity.
I think we should be able to detect when the remote address of the failed channel is an IPv6 address, and then if the connectivity service reports no IPv6 connectivity we should show a different error message.

Here are some steps that we need to take:

  1. Implement a new RPMIsIPv6Connection() in NetErrorChild that checks if nsIHttpChannelInternal::remoteAddress contains :.
  2. If it does, we need to call into NetErrorParent to check if the browser has detected IPv6 connectivity.

Something like this.
The parent check should be:

Cc[
  "@mozilla.org/network/network-connectivity-service;1"
].getService(Ci.nsINetworkConnectivityService).IPv6 == Ci.nsINetworkConnectivityService.NOT_AVAILABLE 

If this returns true, that means we are attempting to load an IPv6 page, and there's no IPv6 connectivity. In that case, we should be displaying a different message telling the user that the issue is caused by the lack IPv6 connectivity on their computer.

Flags: needinfo?(valentin.gosu)
The way Firefox uses the endpoint is that it creates a channel to that URL and sets the NS_HTTP_DISABLE_IPV4 flag on the channel.

This doesn't work when using a proxy, as the proxy is responsible for performing DNS resolution and deciding what protocol to use.
The proxy will always use its preferred protocol to connect to the endpoint (and may fail over to the other in the event that the primary fails), and so long as at least one protocol is available both checks will succeed.
The proxy can easily be dual stack, even if the client only has a single stack connection to reach the proxy itself.

Implement a new RPMIsIPv6Connection() in NetErrorChild that checks if nsIHttpChannelInternal::remoteAddress contains :.

As far as i understand it, firefox does not attempt to do AAAA lookups if it does not believe IPv6 connectivity is available. As such, the remoteAddress will never contain : unless the user has explicitly provided an IPv6 address as URL, which is very uncommon as 99% of all browsing is done via DNS names.

The more common scenario is that the user provides a hostname, firefox attempts to perform an A lookup because it believes IPv6 connectivity is not available, does not find an address and fails without ever attempting to perform an AAAA lookup.
At https://www.ev6.net/v6sites.php you will find a list of over 50k hosts now which have AAAA records and no A records.

An AAAA lookup really needs to happen regardless, and appropriate error messages displayed if AAAA exists but A doesn't or is inaccessible.

There is also an inverse scenario - where a user has IPv6 connectivity but no legacy connectivity, firefox should also be able to handle this scenario. I've seen this crop up quite a lot lately as legacy connectivity is often second class provided through a NAT gateway while v6 is direct so legacy connectivity breaks if the NAT gateway fails. Similarly legacy DHCP pools can get filled (especially with mobile devices that now present random MAC addresses so they get a new lease every time) which breaks legacy connectivity while v6 still works.

The use of NAT64/DNS64 is also extremely common on mobile networks, whereby an AAAA lookup will always succeed, so the link might be IPv6-only but legacy sites are still accessible anyway.

So it seems there are three problems:

  1. connectivity check does not handle proxies correctly and will return false results
  2. users are not given a useful error message if they try to access an unreachable site from a single stack network
  3. the connectivity check state is only stored internally and not displayed to the user

(In reply to bert from comment #21)

The way Firefox uses the endpoint is that it creates a channel to that URL and sets the NS_HTTP_DISABLE_IPV4 flag on the channel.

This doesn't work when using a proxy, as the proxy is responsible for performing DNS resolution and deciding what protocol to use.

In that case there's not much Firefox can do about it, as we don't even know the IP being used.

Implement a new RPMIsIPv6Connection() in NetErrorChild that checks if nsIHttpChannelInternal::remoteAddress contains :.

As far as i understand it, firefox does not attempt to do AAAA lookups if it does not believe IPv6 connectivity is available. As such, the remoteAddress will never contain : unless the user has explicitly provided an IPv6 address as URL, which is very uncommon as 99% of all browsing is done via DNS names.

For DNS over HTTPS we actually do resolve both A and AAAA regardless of connectivity.
For getaddrinfo we depend on the system whether it returns AAAA for AF_UNSPEC when the system has not IPv6.

The more common scenario is that the user provides a hostname, firefox attempts to perform an A lookup because it believes IPv6 connectivity is not available, does not find an address and fails without ever attempting to perform an AAAA lookup.
At https://www.ev6.net/v6sites.php you will find a list of over 50k hosts now which have AAAA records and no A records.

An AAAA lookup really needs to happen regardless, and appropriate error messages displayed if AAAA exists but A doesn't or is inaccessible.

Agreed.

There is also an inverse scenario - where a user has IPv6 connectivity but no legacy connectivity, firefox should also be able to handle this scenario. I've seen this crop up quite a lot lately as legacy connectivity is often second class provided through a NAT gateway while v6 is direct so legacy connectivity breaks if the NAT gateway fails. Similarly legacy DHCP pools can get filled (especially with mobile devices that now present random MAC addresses so they get a new lease every time) which breaks legacy connectivity while v6 still works.

The use of NAT64/DNS64 is also extremely common on mobile networks, whereby an AAAA lookup will always succeed, so the link might be IPv6-only but legacy sites are still accessible anyway.

So it seems there are three problems:

  1. connectivity check does not handle proxies correctly and will return false results

The connectivity check currently tells us if the system has IPv4 or IPv6 connectivity.
Extending that to proxies is tricky, as some proxies only get used for some hosts.

  1. users are not given a useful error message if they try to access an unreachable site from a single stack network

We'll deal with that in this bug.

  1. the connectivity check state is only stored internally and not displayed to the user

We could add that info to about:networking I suppose.

In that case there's not much Firefox can do about it, as we don't even know the IP being used.

Correct, but that's because there is a single "detectportal" site which is dual stack.
If you follow the MS approach of using two different host records which are single stack then this would provide more accurate results when using a proxy, and would work unless the proxy operator intentionally cheats the process.

If a proxy is only used for specific hosts i would ignore it, only worth checking a proxy that's used for everything or a proxy that's set as the default fallback after host specific rules have been processed. I suppose detailed information about the proxy setup (eg a list of override urls) could be present in about:networking too.

Using NS_HTTP_DISABLE_IPV4 / NS_HTTP_DISABLE_IPV6 is also potentially problematic on a NAT64/DNS64 network, as you can still access legacy sites but have to do so using v6 sockets to the synthesized DNS64 addresses.

macOS actually has a check for DNS64 - it performs an AAAA DNS lookup of "ipv4only.arpa" and if it gets a response it knows NAT64 is in use and can calculate the prefix.

the connectivity check state is only stored internally and not displayed to the user
We could add that info to about:networking I suppose.

It could probably do with being more prominent than that, eg most users wouldn't know to check about:networking just like most windows users won't go into the interface properties to check. Plus the about:networking panel isn't especially friendly to non technical users.
Something like a traffic light icon where green is full dual stack connectivity, yellow is partial connectivity and red is no internet / airgapped network only. Clicking on this could then bring up more detailed information and a user friendly explanation.

Also worth showing the currently detected state in a convenient way whenever an error is encountered, especially for related errors - eg if you get a timeout to a dual stack site from a single stack client you have no idea if switching protocols would make a difference so it could be suggested as a possible solution.

Something else that may be useful, is a link to one or more public site checker services, for instance https://semonto.com/tools/website-reachability-check

That way you could quickly rule out a problem with the target site vs a problem with your own connectivity.

(In reply to bert from comment #24)

Something else that may be useful, is a link to one or more public site checker services, for instance https://semonto.com/tools/website-reachability-check

I wouldn't use that tool, as it doesn't report anything about v4/v6 reachability. (which is exactly what the issue is)
You could take a look at the list of similar services on https://testmyconnection.net/ (still not perfect).
Regardless, linking to an external service/website on an error page seems like a very bad idea to me.

Ahh yes, https://testmyconnection.net/ is better as it does report more useful information.

Sure linking to a random site probably isn't a good idea, having an official service for this purpose would be better.

Next we want to check if it's a routing issue by probing from different endpoints?

Come on peeps, that's overkill here. Let's focus just on DNS and successful connection from the user's host.

You need to log in before you can comment on or make changes to this bug.