Closed
Bug 193827
Opened 22 years ago
Closed 20 years ago
DNS sometimes hangs in TCP mode
Categories
(Core :: Networking, defect)
Tracking
()
VERIFIED
WORKSFORME
People
(Reporter: mal, Assigned: gordon)
References
()
Details
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003 Once in a while DNS resolver in mozilla-1.0.1-2.7.3 (RedHat 7.3) hangs. It prints "resolving host.name.com" in status line and can not open a site. Already opened sites work OK, but new sites can not be resolved. If I exit from mozilla - this does not help because some mozilla processes still running (see below). The only way to fix this - do killall mozilla-bin (sometimes need to be done several times). Only after this a new instance of mozilla can be started. This is very annoying. The problem seems exists since the time of Netscape 3 and in all Mozilla versions. I had this (very annoying) problem on a number of Linux/Solaris versions with a valiety of mozilla versions. P.S. When DNS hungs - then these are the proseess running in background after I exited from mozilla. ps axuww|grep mozilla mal 14281 0.2 17.0 63140 43712 ? S Feb17 2:02 /usr/lib/mozilla/mozilla-bin mal 14287 0.0 17.0 63140 43712 ? S Feb17 0:00 /usr/lib/mozilla/mozilla-bin mal 14289 0.0 17.0 63140 43712 ? S Feb17 0:00 /usr/lib/mozilla/mozilla-bin mal 14291 0.0 17.0 63140 43712 ? S Feb17 0:01 /usr/lib/mozilla/mozilla-bin Reproducible: Sometimes Steps to Reproduce: This happens on average once a week.
Reporter | ||
Comment 1•22 years ago
|
||
Note that when I do dig host.name.com in command line I get a proper response. The DNS is working. This is a problem with mozilla DNS resolver.
Reporter | ||
Comment 2•22 years ago
|
||
Also , as it pointed out in http://bugzilla.mozilla.org/show_bug.cgi?id=188332 The site http://story.news.yahoo.com/ always causes this problem. But command # host story.news.yahoo.com shows correct DNS resolution story.news.yahoo.com is an alias for dailynews.yahoo.com. dailynews.yahoo.com is an alias for dailynews.yahoo.akadns.net. dailynews.yahoo.akadns.net has address 64.58.76.117
.
Assignee: asa → dougt
Component: Browser-General → Networking
QA Contact: asa → benc
Comment 4•22 years ago
|
||
-> invalid because the build is to old and you use a non mozilla.org build.
Status: UNCONFIRMED → RESOLVED
Closed: 22 years ago
Resolution: --- → INVALID
Reporter | ||
Comment 5•22 years ago
|
||
It is not that old (Oct 2002), just three months old. And I bet same error also present in new builds. You would rather test and find out the cause of this very annoying bug (people often need to reboot their computers because of this) rather than mark a valid and well described report as invalid.
Comment 6•22 years ago
|
||
Vladislav, if you can reproduce this bug on a new, Mozilla.org build then please reopen this report. For a bug to be valid, it must be reproducible on a build < 1 month old.
Reporter | ||
Comment 7•22 years ago
|
||
Can you check the web site http://story.news.yahoo.com/ from your new browser. It often (but not always) hangs on mine.
Comment 8•22 years ago
|
||
This URL does not hang for me using 20030210 on WinXP.
Reporter | ||
Comment 9•22 years ago
|
||
Exactly the same DNS problem exists with Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3b) Gecko/20030210 I just clicked on http://story.news.yahoo.com/ and same things "resolving story.news.yahoo.com" and few processes in background, exist even after I exitsed mozilla.
Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
Reporter | ||
Comment 10•22 years ago
|
||
Also note that the http://story.news.yahoo.com/ sometimes works. Same story as I mentioned earlier.
Updated•22 years ago
|
Reporter | ||
Comment 11•22 years ago
|
||
If this is important: My Linux box (on which mozilla has this problem) has 192.168.x.x IP it is behind another linux box (masquerading). The DNS servers are nameserver 194.8.160.90 nameserver 195.131.52.130 DNS is working OK. host story.news.yahoo.com always show the right IP: story.news.yahoo.com is an alias for dailynews.yahoo.com. dailynews.yahoo.com is an alias for dailynews.yahoo.akadns.net. dailynews.yahoo.akadns.net has address 64.58.76.117 The thing you may be interested in is: dig sometimes gets DNS response in TCP mode for this specific host: story.news.yahoo.com ----------- dig story.news.yahoo.com A ;; Truncated, retrying in TCP mode. ....... I am not sure this is related, seems no. Also, seems mozilla DNS resolver does not have its own timeout set right.
Reporter | ||
Comment 12•22 years ago
|
||
Also, this may be related (but note that mozilla DNS resolver timewout is wrong anyway). From time to time I get this with dig: ------------------------------------------- $ dig story.news.yahoo.com a ;; Truncated, retrying in TCP mode. ; <<>> DiG 9.2.1 <<>> story.news.yahoo.com a ;; global options: printcmd ;; connection timed out; no servers could be reached ------------------------------------------------ No such problem with other hosts.
Reporter | ||
Comment 13•22 years ago
|
||
I bet this is somehow related to timeout when DNS request is made in tcp mode. (you can imitate it with this). dig +tcp story.news.yahoo.com a Mozilla DNS resolver in tcp mode seems does not work right. (Also note that timeouts are also wrong).
Reporter | ||
Comment 14•22 years ago
|
||
This is what happens when mozilla DNS hangs: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3b) Gecko/20030210 1. It tries to get DNS via UDP - it somehow fails. 2. Then it constantly trying to access DNS via TCP, gets no reeply but continues to try. This seems same error as dig in example above, but dig has the right timeout set, mozilla does not have right timeout. As a result the resolver just hangs. This problem exist in all mozilla & netscape I tried. 598 PROTO=UDP SPT=53 DPT=32794 LEN=158 Feb 19 13:58:05 hnx kernel: IN= OUT=eth0 SRC=192.168.3.97 DST=194.8.160.90 LEN=66 TOS=0x00 PREC=0x00 TTL=64 ID=4660 DF PROTO=UDP SPT=32794 DPT=53 LEN=46 Feb 19 13:58:05 hnx kernel: IN=eth0 OUT= MAC=00:d0:b7:07:7b:f1:00:d0:b7:07:7b:f0:08:00 SRC=194.8.160.90 DST=192.168.3.97 LEN=536 TOS=0x00 PREC=0x00 TTL=61 ID=42793 PROTO=UDP SPT=53 DPT=32794 LEN=516 Feb 19 13:58:05 hnx kernel: IN= OUT=eth0 SRC=192.168.3.97 DST=194.8.160.90 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=16898 DF PROTO=TCP SPT=47409 DPT=53 WINDOW=5840 RES=0x00 SYN URGP=0 Feb 19 13:58:08 hnx kernel: IN= OUT=eth0 SRC=192.168.3.97 DST=194.8.160.90 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39860 DF PROTO=TCP SPT=47409 DPT=53 WINDOW=5840 RES=0x00 SYN URGP=0 Feb 19 13:58:14 hnx kernel: IN= OUT=eth0 SRC=192.168.3.97 DST=194.8.160.90 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39861 DF PROTO=TCP SPT=47409 DPT=53 WINDOW=5840 RES=0x00 SYN URGP=0 Feb 19 13:58:26 hnx kernel: IN= OUT=eth0 SRC=192.168.3.97 DST=194.8.160.90 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39862 DF PROTO=TCP SPT=47409 DPT=53 WINDOW=5840 RES=0x00 SYN URGP=0 Feb 19 13:58:50 hnx kernel: IN= OUT=eth0 SRC=192.168.3.97 DST=194.8.160.90 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39863 DF PROTO=TCP SPT=47409 DPT=53 WINDOW=5840 RES=0x00 SYN URGP=0 Feb 19 13:59:38 hnx kernel: IN= OUT=eth0 SRC=192.168.3.97 DST=194.8.160.90 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39864 DF PROTO=TCP SPT=47409 DPT=53 WINDOW=5840 RES=0x00 SYN URGP=0 Feb 19 14:01:14 hnx kernel: IN= OUT=eth0 SRC=192.168.3.97 DST=195.131.52.130 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39865 DF PROTO=TCP SPT=47410 DPT=53 WINDOW=5840 RES=0x00 SYN URGP=0 Feb 19 14:01:17 hnx kernel: IN= OUT=eth0 SRC=192.168.3.97 DST=195.131.52.130 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=10686 DF PROTO=TCP SPT=47410 DPT=53 WINDOW=5840 RES=0x00 SYN URGP=0 Feb 19 14:01:23 hnx kernel: IN= OUT=eth0 SRC=192.168.3.97 DST=195.131.52.130 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=10687 DF PROTO=TCP SPT=47410 DPT=53 WINDOW=5840 RES=0x00 SYN URGP=0 Feb 19 14:01:35 hnx kernel: IN= OUT=eth0 SRC=192.168.3.97 DST=195.131.52.130 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=10688 DF PROTO=TCP SPT=47410 DPT=53 WINDOW=5840 RES=0x00 SYN URGP=0
Comment 15•21 years ago
|
||
I thought TCP mode was only used for: 1- named-xfer 2- responses that do not fit in a single UDP datagram. Is there any indication why TCP is used? A client should probably never end up in TCP mode for DNS.
Flags: blocking1.4a?
Hardware: Other → PC
Summary: DNS sometimes hangs. → DNS sometimes hangs in TCP mode
Reporter | ||
Comment 16•21 years ago
|
||
As far as I know client may use TCP mode, and dns resolvers which come with glibc (and bind-utils) sometimes uses DNS in TCP mode. It definitelly happen when a response does not fit in a single UDP datagram. See http://colalug.org/members/resources/IP-Chains/HOWTO-5.html -------------------- 5.2 What not to filter out. TCP connections to DNS (nameservers). If you're trying to block outgoing TCP connections, remember that DNS doesn't always use UDP; if the reply from the server exceeds 512 bytes, the client uses a TCP connection (still going to port number 53) to get the data. This can be a trap because DNS will `mostly work' if you disallow such TCP transfers; you may experience strange long delays and other occasional DNS problems if you do. -------------------- It is pretty common to have a long DNS response which goes via TCP. glibc definitelly uses both UDP and TCP dns in client mode. mozilla can be just linked against DNS resolver provided by underlying OS. Such DNS resolver is usually stable and has the right timeouts. For example (on linux) the libraries /usr/lib/libdns.so.5.3.0 from bind-utils or /lib/libresolv-2.2.5.so from glibc. Seems does the right thing. Similar libraries exist in every OS, I think there is no need to duplicate DNS code in mozilla. man resolver (on Linux) RESOLVER(3) Linux Programmer's Manual RESOLVER(3) NAME res_init, res_query, res_search, res_querydomain, res_mkquery, res_send, dn_comp, dn_expand - resolver rou- tines SYNOPSIS #include <netinet/in.h> #include <arpa/nameser.h> #include <resolv.h> extern struct state _res; rpm -ql glibc |grep resol /lib/libresolv-2.2.5.so /lib/libresolv.so.2 Also see bind-utils package. rpm -ql bind-utils-9.2.1-0.7x /usr/bin/dig /usr/bin/host /usr/bin/nslookup /usr/lib/libdns.so.5 /usr/lib/libdns.so.5.3.0 /usr/lib/libisc.so.4 /usr/lib/libisc.so.4.1.0 /usr/share/man/man1/dig.1.gz /usr/share/man/man1/host.1.gz /usr/share/man/man5/resolver.5.gz /usr/share/man/man8/nslookup.8.gz /usr/share/man/man8/nsupdate.8.gz
Reporter | ||
Comment 17•21 years ago
|
||
Also, from man resolver (on RedHat linux 7.3) RES_USEVC Use TCP connections for queries rather than UDP datagrams. RES_IGNTC Ignore truncation errors. Don't retry with TCP. [Not currently implemented]. Mozilla current dns resolver seems not working right, while underlying OS dns resolver is OK. Why not just to use it.
Reporter | ||
Comment 18•21 years ago
|
||
By the way, if you need to reproduce this DNS problem - it rather easy on Linux. 1. Set to deny ougoing TCP traffic to your DNS server(s). iptables -I OUTPUT -p TCP -d yor.dns.ip -j DROP 2. Hit any web site for which DNS response is >512 bytes, like http://story.news.yahoo.com/ (but this site is trange, it sometimes DNS server give <512 bytes reponse for it). A better option is to craft a special DNS site with a binch of IPs/nameservers. 3. See the stalls. My opinion - the best option is to use underlying OS DNS resolver, rather than some home-grown one. The problem is extremelly annoyoing one. I seen people need to reboot their computer (few ones find to do killall mozilla-bin on Linux or find and kill process in MS-Windows task manager).
Comment 19•21 years ago
|
||
+clean-report, cc gordon. I *know* why it does a mode switch (in fact another case is if you have MTU that is very small...) but I didn't realize that it happened so easily in real life. If I were a working/breathing hostmaster, I would never configure my domain to return such large responses. We've had this problem on and off for a long time for a couple sites, and nobody ever put theire finger on it. I don't understand the DNS/TCP timeout. Shouldn't we just call the resolver, and it manages the TCP timeoute values? I can see that this jams up the DNS service, because we have a serialized service for DNS. So, if I understand this correctly, this only happens sometimes, and if it does, the TCP connection actually fails (so you'd see a SYN_SENT entry in netstat -tcp )?
Keywords: clean-report
Reporter | ||
Comment 20•21 years ago
|
||
>We've had this problem on and off for a long time for a couple sites, and nobody >ever put theire finger on it. Now it is very common. Especially some sites in .aol.com networks have vely long list of nameservers/IPs. Also, some sites have very strange behavior, that returned DNS sometimes too long for UDP and sometimes not. >I don't understand the DNS/TCP timeout. Shouldn't we just call the resolver, and >it manages the TCP timeoute values? I can see that this jams up the DNS service, >because we have a serialized service for DNS. Back in netscape 2.x and early versions ov 3.x on UNIX the DNS lookup was called directly via UNIX resolver from the thread which does graphical re-painting. It was OK, (except during lookup no button can be clicked, because it was done from graphical re-painting thread), and some computers had weirid DNS timeouts (like 5 minutes). I personally would use resolver system function (or whatever else comes with other OSes like Windows), with reasonable DNS (UDP/TCP) timeout value, (preferrably explicitly set in prefs.js, but only default value is also OK), and doing lookup from a separate thread, as mozilla does right now, so user can click on buttons during DNS lookup. I did not check recent linux API, whether it is possible to setup TCP timeout for "connect" during DNS tcp lookup via "resolver" function. If not - a common approach with something like alert() signal can be used to terminate a thread doing lookup for too long. Also important resolver feature to add should be this: if, during DNS lookup, user clicked "STOP" button the DNS lookup should be immediatelly aborted (like by sending the same ALERT signal to thread doing dns lookup). There also should be no "runaway" DNS lookup processes, as it happens right now after an exit from mozilla when the described in this BUG #193827 falure occures. This is my personal opinion, I may be missed something. >So, if I understand this correctly, this only happens sometimes, >and if it does, the TCP connection actually fails >(so you'd see a SYN_SENT entry in netstat -tcp )? Yes, my DNS provider(wrongly) does not allow DNS lookup via tcp, all TCP request to port 53 just gets denied without ICMP back. ----------------- $ telnet 194.8.160.90 53 Trying 194.8.160.90... === and 20 minutes later === Connection timeout ----------------- The descrobed in BUG # 193827 problem occures always(almost) when I get this from dig: ----------------- $ dig story.news.yahoo.com a ;; Truncated, retrying in TCP mode. ----------------- But dig sometimes seldomly manages to get a response in UDP mode, sometimes not. This is why we have such intermitten problem. I think the described problem occures when UDP request was truncated, mozilla tried to access in TCP and tcp failed. The reason why me (and many other people I know) noticed this is while such multiple falure is seldom by iteslf (in other location when DNS server properly working with TCP requests this happens with me about once a month), once it happened - the mozilla is in unusable state. You need to reboot the computer or manually kill the process. This is extremelly annoying, and why people notice this. Only in your bugzilla database I found several such reports about mozilla in unusable state. http://bugzilla.mozilla.org/show_bug.cgi?id=192271 http://bugzilla.mozilla.org/show_bug.cgi?id=188332 (and probably many others, because some people attribute this to stalls, not to DNS).
Comment 21•21 years ago
|
||
so is this report confirmed? What is it still in the Unconfirmed state?
Comment 22•21 years ago
|
||
There is enough data to have an engineer research this TCP timeout suggested by the dig output. I've looked for an API that shows a TCP timeout for resolver and been unable to figure out why dig works and mozilla doesn't. I also have discussed this with Darin, and he doesn't know of any obvious explainations for the information we have here. There is also more analysis I could do, if you made it a 1.4a blocker, which would take me off cookies for a half day.
Updated•21 years ago
|
Flags: blocking1.4a? → blocking1.4a-
Reporter | ||
Comment 23•21 years ago
|
||
This problem appears to be much more common, than I originally thought. What seems to be happening: 1. mozilla sends DNS query via UDP, but the datagram gets dropped on its way to DNS server because of high network traffic. 2. mozilla then is trying to send DNS query via TCP and gets the problem described in this bug. The described behaviour occures pretty commonly. And what is the most annoyning - after this scenario happened - mozilla is in unusable state. Even exit from mozilla does not help: several mozilla processes are still running in backgroung (see below). User needs either reboot the computer or do killall mozilla-bin Otherwise a new mozilla instance can not be started. See another bug http://bugzilla.mozilla.org/show_bug.cgi?id=192271 with similar runaway DNS query processes in background. Or search google - there is a number of postings about mozilla forcing users to reboot their computers. P.S. Processes in background after mozilla exit: ps axuww|grep mozilla mal 1239 2.3 19.4 62616 49532 ? S 03:00 3:35 /usr/lib/mozilla/mozilla-bin mal 1245 0.0 19.4 62616 49532 ? S 03:00 0:00 /usr/lib/mozilla/mozilla-bin mal 1247 0.0 19.4 62616 49532 ? S 03:00 0:00 /usr/lib/mozilla/mozilla-bin mal 1249 0.0 19.4 62616 49532 ? S 03:00 0:01 /usr/lib/mozilla/mozilla-bin mal 17105 0.0 19.4 62616 49532 ? S 05:11 0:00 /usr/lib/mozilla/mozilla-bin
Flags: blocking1.4b?
Assignee | ||
Comment 24•21 years ago
|
||
re comment #23: Vladislav, how are you exiting mozilla?
Reporter | ||
Comment 25•21 years ago
|
||
I am exiting mozilla from main menu via File->Quit All mizilla windows gets closed, but mozilla processes still running in background. They are seen as ps axuww|grep mozilla
Comment 26•21 years ago
|
||
Dougt, can you look into this for 1.4?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Updated•21 years ago
|
Flags: blocking1.4b?
Flags: blocking1.4b-
Flags: blocking1.4?
Updated•21 years ago
|
Flags: blocking1.4? → blocking1.4-
Comment 28•21 years ago
|
||
We've had this problem for a long time. If I could show that there are many other cases of this problem, and they are causing people to restart their browser, would that justify blocking 1.4? Or if I could reproduce this internally, would it justify getting some engineering analysis? At this time, I don't think that clearning the DNS cache is a workaround. When this happens, your DNS is broken until you restart.
Reporter | ||
Comment 29•21 years ago
|
||
>and they are causing people to restart their browser
You can not just restart browser, you need manually kill it
using UNIX kill command.
Browser restart just does not help.
In regards with reproducing this bug.
The easiest way seems to be the following:
1. Create a DNS record which is long enough
so the request is made in UDP and then in TCP mode.
2. Set firewall on client machine
(using ipchains or iptables on Linux) to drop DNS requests in TCP mode.
Then I believe the problem can be reproduced.
Reporter | ||
Comment 30•21 years ago
|
||
>Browser restart just does not help.
I meant File->Quit does not terminate mozilla once the problem
in question occured.
Some processes left in bacgrount after File->Quit
Comment 31•21 years ago
|
||
Vladislav: the problem of mozilla not exiting on a stalled DNS lookup is bug 192271.
Comment 32•21 years ago
|
||
this is probably a duplicate of bug 192271... or at least, once that bug is fixed, this one should pretty much fall to the way side. marking as dependency for now.
Depends on: 192271
Comment 33•20 years ago
|
||
This was probably fixed by the DNS service rewrite. The DNS problem is not mozilla's fault, since it just calls getaddrinfo() or gethostbyname(). What was a problem is that mozilla hangs on exit if a DNS lookup is pending. Marking WORKSFORME since last comment was rather a long time ago. Reporter, please reopen if you still see this on a recent build.
Status: NEW → RESOLVED
Closed: 22 years ago → 20 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•