Closed Bug 1174249 Opened 9 years ago Closed 6 years ago

Server not found possibly due to DNS issue.

Categories

(Core :: Networking: DNS, defect, P2)

38 Branch
x86_64
Windows 7
defect

Tracking

()

RESOLVED INACTIVE

People

(Reporter: tcflorea, Unassigned, NeedInfo)

References

Details

(Whiteboard: [necko-next])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0
Build ID: 20150513174244

Steps to reproduce:

Relative heavy usage of Firefox: ~50Tabs / 1GB mem. usage.
Unfortunately I don''t know what exactly trigger the issue as it happens randomly. I can only guess that this have something to do with a limit of domain names being accessed recently.


Actual results:

For any new web page access attempt or even page refresh I get a "Server not found" message. (sometimes on the spot, sometimes after a few seconds when it looks like waiting for ... something). 
The issue recovers by itself after a few minutes (1 min?) but once it started to happen it tends to happen again shortly making Firefox unusable.
Accessing pages by IP address works fine while the issue occurs.  (e.g. accessing my home router using 192.168.1.254 works but using BThomehub.home don't work)
Only Firefox is exhibiting the issue: E.g. Internet Explorer or ping google.com are working while Firefox returns "Server not found"
Suggestion below did not work.
http://kb.mozillazine.org/Error_loading_websites#Only_Mozilla_applications_are_having_problems
The only way to recover is to kill/restart Firefox.


Expected results:

Firefox should have sent a DNS request.
Using Wireshark I could see no DNS packet being sent out when I try to access a page in Firefox. (Interestingly there are netbios-ns packets sent for that page).
OS: Unspecified → Windows 7
Hardware: Unspecified → x86_64
Anothe related issue, or rather something related to this one:
I'm usig Direct Access Windoes VPN:
Below is what I see in about:networking when attempt to access a intranet host:
When attempting to access a intranet host I've got 2 different entries in DNS tab:
intranet_host	ipv4	172.x.x.x	110
intranet_host	ipv4	fdcb:xxxx:xxxx::acxx:xxxx	109

Randomly, accessing the intranet_host will fail.
Accessing instead intranet_host.my_domain.com always work. Again 2 etries exists in DNS tab:
intranet_host.my_domain.com	ipv6	fdcb:xxxx:xxxx::acxx:xxxx	97
intranet_host.my_domain.com	ipv4	fdcb:xxxx:xxxx::acxx:xxxx	97

To be noted in both cases these are inappropriate address family used.
You say you tried the suggestions from the mozillazine page. Could you confirm you now run with network.dns.disableIPv6=true and network.dns.disablePrefetch=true?

What do you see for regular hosts like google.com (the problem described comment 0) in about:networking -> DNS?

Did you try https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging ?
Component: Untriaged → Networking: DNS
Flags: needinfo?(tcflorea)
Product: Firefox → Core
Q: Could you confirm you now run with network.dns.disableIPv6=true and network.dns.disablePrefetch=true?
A: No. At this moment both are on default value: (network.dns.disableIPv6=false and network.dns.disablePrefetch=false). 

Q: What do you see (...) in about:networking -> DNS.
A. When the problem occurs about:networking -> DNS table is empty. 

Q. Did you try https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging ?
A. Unfortunately I did not tried HTTP logging so far. I'll use this in future for further debugging.
Flags: needinfo?(tcflorea)
Thanks for the quick reply!

>A. When the problem occurs about:networking -> DNS table is empty. 

Do you mean it's completely empty? That's strange. Do you see any errors in Tools -> Web developer -> Browser Console by any chance?

Since you see IPv6-related weirdness I'd suggest trying to run with IPv6 off.

Let's see if you see anything useful in the logs.

(I moved this to the component where someone working on our networking code might give further suggestions; I myself have no other ideas on how to debug this further, sorry!)
Flags: needinfo?(tcflorea)
You say "The issue recovers by itself after a few minutes (1 min?)" - do the resolved addresses appear on the DNS tab when Firefox recovers? Or is the tab still completely empty until you restart?
The empty about:networking -> DNS table table is consistent with the fact that no DNS packet is sent accross the network and "Server not found" message issued. Using different vaules for network.dns.* (like network.dns.disableIPv6=True) config does not seem to prevent the issue from occurring.
Maybe the IPv6-related issue is different after all...
I haven't look on Tools -> Web developer -> Browser Console.
I have used Tools -> Web developer -> Networking -> Timing.
Normally is something like:
DNS Resoluton 0ms
Connecting 314ms
Sending 0 ms
Waiting 161 ms
Receiving 107ms.
When DNS issue occurs the above sequence is:
DNS Resoluton 200ms <--- greater than 0
Connecting 0ms <-- 0 anything else.
Sending 0 ms
Waiting 0 ms
Receiving 0 ms.
Flags: needinfo?(tcflorea)
The reason I was wondering about the contents of about:networking is that it could indicate a problem with Firefox (typically the page displays the cached values of DNS responses). I also encountered cases where the OS (Mac OS X) cached NXDOMAIN responses from the (broken) router, so no DNS request was sent over network.
I'm using Firefox 40.0, Build id 20150807085045

And i also have the exact same problem as Tudor Florea. I get sporadic "Server not found" message after surfing a while. And the only thing i can do is to restart Firefox or wait and and hope Firefox will recover as Tudor explained.

I currently and use the settings below, but it's not working.
network.dns.disableIPv6=true
network.dns.disablePrefetch=true

Any progress on this matter?
Same issue on Firefox 43.0.4 Windows XP.

* Happens after few hours of usage (250 tabs in various tabs groups).  Does not resolve itself, requires a restart.
* Websites can be accessed via IP
* network.dns.disableIPv6 and network.dns.disablePrefetch didnt help
* No DNS queries sent out (Wireshark)
Whiteboard: [necko-backlog]
Same issue on FF 44.0.1 (and older versions too) on Ubuntu 14.04.03.
about:networking shows this:
Hostname|Family|Addresses
plus.google.com|ipv4|2607:f8b0:4006:80f::200e
so it looks like it treats IPv6 addresses as IPv4 and thus can not connect.

Sometimes it resolves itself after a while, but still really annoying.
Update: Issue continues on 44.0.1 (Windows 7).  As ilj mentioned, it sometimes resolves itself after a while - but that is rare for me.   I literally restart my browser almost every hour nowadays.
not sure if it's helpful, but there seems to be a correlation with loosing wifi signal on my laptop.
watching closely on each case. correlation seems to exist. also happens when i resume from suspend and a different network is available.
this issue is really frustrating, happening daily. does anyone know a way we can get some attention to it?
I had a conversation with someone on IRC a while back (#firefox or #necko can't remember).

We need to reproduce the issue while logging:
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

"note that the the logs might become quite large after a while.... if that happens set NSPR_LOG_MODULES=timestamp,nsHttp:5,nsHostResolver:5 and run it again.
my guess is that either a call to resolve a domain is stuck, or the DNS thread is stalled or dead"

On a side note, frequency of the issue has increased.  Yesterday i consecutively restarted firefox 3 bloody times - 2 of which i had the issue instantly after launch.  It's getting really frustrating to use Firefox these days.
A remark: In this thread was described 2 different issues: 
1. about:networking -> DNS is empty resulting in "Server not found" for any web page
2. about:networking -> DNS exhibits inapropriate address family for some entries making some intranet web server inaccessible (e.g.: some_host.my_company.com	ipv4	fdcb:XXXX:XXXX:XXXX::XXXX:XXXX) 
While the second is easier reproducible (and should be fixed) it is the first one that is frustrating.

In my case the issue does not occur anymore so frequent (but still occurs form time to time). HTTP logging was not useful for me, the log file increased up to 4G and Firefox become erratic before reproducing the DNS issue.
Issue still present in 47.0 as described by Tudor in Comment 16 above (case 1):
about:networking -> DNS is empty, and all DNS lookups fail (even "localhost").
Settings:
network.dns.disablePrefetch=true
network.dns.disableIPv6=true
network.dnsCacheExpiration=0
network.dnsCacheExpirationGracePeriod=0
network.dnsCacheEntries=0
network.proxy.socks_remote_dns=true
Can anyone reproduce issue with HTTP Logging please? That would be really really helpful.
Flags: needinfo?(tcflorea)
Flags: needinfo?(ilia.rogov)
Flags: needinfo?(eboth)
Flags: needinfo?(bracket.prime)
I'm afraid I couldn't get Firefox to do any logging based on the instructions here:
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

There are some missing line breaks in the command lines given for 32-bit systems (the are correct for x64), but that's not the issue. 
The env variables are correctly set, and still Firefox does not log (even with add-ons disabled). Is there a setting inside Firefox that might have disabled logging?
Flags: needinfo?(eboth)
Sorry about that. They have been changing the env variable name recently, do not ask me why. Depends which version you use: for some works NSPR_LOG_MODULES and NSPR_LOG_FILE (this is the old one, for 47 and older), the other is MOZ_LOG and MOZ_LOG_FILE (for firefox 49 and upcoming releases and current aurora and nightly), and third is one on the page (this is only for 48).

Sorry for this.
I have tried a few times to reproduce the issue with the logging activated but unfortunately the log file increase rapidly and Firefox become non-resposive before being able to reproduce the issue.
The log rotate mechanism supposed to limit the size of the log file(s) does not work as expected: instead of having a few files of limited size I have:
09/29/2016  09:07 PM     1,774,919,929 log.txt
09/29/2016  09:19 PM                 0 log.txt.child-1
09/29/2016  09:27 PM                 0 log.txt.child-2
09/29/2016  10:14 PM                 0 log.txt.child-3
09/29/2016  10:20 PM                 0 log.txt.child-4
Trying to resolve any domain on about:networking DNS Lookup returns NS_ERROR_UNKNOWN_HOST while the issue is present.
Ar you maybe using the latest nightly?
We have a new feature in about:networking called Logging. Hit start, reproduce your issue, and hit stop. Then upload your logs here if possible. Thanks!
I have attempted to use nightly and Logging feature does the intended work. Unfortunately I was not able to reproduce the issue after a few hours of stress usage. I had to uninstall nightly as a few plugins does not work in nightly (e.g. Widevine Content Decryption Module used by Netflix) and then reset add-on and settings to get my firefox back.
Is it a way to run nighlty in a sandbox (e.g. with its own profile and add-ons, and parallel with latest firefox)?
Yes. You can create a new profile by running firefox -P
Then run firefox -P new_profile -no-remote
The -no-remote option makes sure you can run this firefox in parallel with another.
Same problem in 32-bit 52.0b8 (20170220070057) on Windows 7 x64. As you can see Firefox refuses to resolve hosts even from hosts.txt.

My logs before and after restart:

Before restart:
2017-02-24 15:55:25.117000 UTC - [Main Thread]: D/nsHostResolver Resolving host [localhost].
2017-02-24 15:55:25.117000 UTC - [Main Thread]: D/nsHostResolver   No usable address in cache for host [localhost].
2017-02-24 15:55:25.117000 UTC - [Main Thread]: D/nsHostResolver   DNS thread counters: total=2 any-live=0 idle=2 pending=1
2017-02-24 15:55:25.117000 UTC - [Main Thread]: D/nsHostResolver   DNS lookup for host [localhost] blocking pending 'getaddrinfo' query: callback [afb4880]
2017-02-24 15:55:25.117000 UTC - [DNS Resolver #146]: D/nsHostResolver DNS lookup thread - Calling getaddrinfo for host [localhost].
2017-02-24 15:55:25.125000 UTC - [DNS Resolver #146]: D/nsHostResolver DNS lookup thread - lookup completed for host [localhost]: failure: unknown host.
2017-02-24 15:55:25.125000 UTC - [DNS Resolver #146]: D/nsHostResolver nsHostResolver record d9cb320 new gencnt
2017-02-24 15:55:25.125000 UTC - [DNS Resolver #146]: D/nsHostResolver Caching host [localhost] negative record for 60 seconds.
2017-02-24 15:55:25.125000 UTC - [DNS Resolver #146]: D/nsHostResolver Issuing second async lookup for TTL for host [localhost].
2017-02-24 15:55:25.125000 UTC - [DNS Resolver #146]: D/nsHostResolver   DNS thread counters: total=2 any-live=0 idle=1 pending=1
2017-02-24 15:55:25.126000 UTC - [DNS Resolver #146]: D/nsHostResolver DNS lookup thread - Calling getaddrinfo for host [localhost].
2017-02-24 15:55:25.133000 UTC - [DNS Resolver #146]: D/nsHostResolver DNS lookup thread - lookup completed for host [localhost]: failure: unknown host.
2017-02-24 15:55:25.133000 UTC - [DNS Resolver #146]: D/nsHostResolver nsHostResolver record d9cb320 new gencnt
2017-02-24 15:55:25.133000 UTC - [DNS Resolver #146]: D/nsHostResolver Caching host [localhost] negative record for 60 seconds.


After restart:
2017-02-24 16:07:39.966000 UTC - [Main Thread]: D/nsHostResolver Resolving host [localhost].
2017-02-24 16:07:39.966000 UTC - [Main Thread]: D/nsHostResolver   No usable address in cache for host [localhost].
2017-02-24 16:07:39.966000 UTC - [Main Thread]: D/nsHostResolver   DNS thread counters: total=3 any-live=0 idle=3 pending=1
2017-02-24 16:07:39.966000 UTC - [Main Thread]: D/nsHostResolver   DNS lookup for host [localhost] blocking pending 'getaddrinfo' query: callback [b23a040]
2017-02-24 16:07:39.966000 UTC - [DNS Resolver #159]: D/nsHostResolver DNS lookup thread - Calling getaddrinfo for host [localhost].
2017-02-24 16:07:39.985000 UTC - [DNS Resolver #159]: D/nsHostResolver DNS lookup thread - lookup completed for host [localhost]: success.
2017-02-24 16:07:39.986000 UTC - [DNS Resolver #159]: D/nsHostResolver nsHostResolver record d9cb320 new gencnt
2017-02-24 16:07:39.986000 UTC - [DNS Resolver #159]: D/nsHostResolver Caching host [localhost] record for 60 seconds (grace 0).
2017-02-24 16:07:39.986000 UTC - [DNS Resolver #159]: D/nsHostResolver Issuing second async lookup for TTL for host [localhost].
2017-02-24 16:07:39.986000 UTC - [DNS Resolver #159]: D/nsHostResolver   DNS thread counters: total=3 any-live=0 idle=2 pending=1
2017-02-24 16:07:39.986000 UTC - [DNS Resolver #146]: D/nsHostResolver DNS lookup thread - Calling getaddrinfo for host [localhost].
2017-02-24 16:07:39.986000 UTC - [Main Thread]: D/nsHostResolver Checking blacklist for host [localhost], host record [d9cb320].
2017-02-24 16:07:39.986000 UTC - [Main Thread]: D/nsHostResolver Checking blacklist for host [localhost], host record [d9cb320].
2017-02-24 16:07:39.999000 UTC - [DNS Resolver #146]: D/nsHostResolver DNS lookup thread - lookup completed for host [localhost]: success.
2017-02-24 16:07:39.999000 UTC - [DNS Resolver #146]: D/nsHostResolver different_rrset localhost
2017-02-24 16:07:39.999000 UTC - [DNS Resolver #146]: D/nsHostResolver different_rrset add to set 1 127.0.0.1
2017-02-24 16:07:39.999000 UTC - [DNS Resolver #146]: D/nsHostResolver different_rrset add to set 2 127.0.0.1
2017-02-24 16:07:39.999000 UTC - [DNS Resolver #146]: D/nsHostResolver different_rrset false
2017-02-24 16:07:39.999000 UTC - [DNS Resolver #146]: D/nsHostResolver Caching host [localhost] record for 60 seconds (grace 0).
Attached file http logs
Same problem for me, reoccurring more and more frequently. I have to restart my browser a few times a day now. DNS table in about:networking is always empty at that time.
I have found some info regarding a possible fix:
https://stackoverflow.com/questions/30360029/if-getaddrinfo-fails-once-it-fails-forever-even-after-network-is-ready
So it seems network changes might mess with the system resolver, so we need to call res_init() once that happens. It seems this fix doesn't apply to Windows, so I'm looking for a separate fix.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Whiteboard: [necko-backlog] → [necko-next]
I think this piece of information might be usefull and just wanted to bring it back into attention:
Using Wireshark I could see no DNS packet being sent out when I try to access a page in Firefox. _Interestingly there are netbios-ns packets sent for that page_.
Flags: needinfo?(tcflorea)
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P2
I'd like to close this bug since it's been a while without activity.
If anyone still has DNS issue, please try to use TRR.
https://blog.nightly.mozilla.org/2018/06/01/improving-dns-privacy-in-firefox/
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
Resolution: INCOMPLETE → INACTIVE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: