Closed Bug 64857 Opened 23 years ago Closed 22 years ago
Conn: Mozilla does not recognize DNS server changes (DHCP)
Mozlla does not seem to recognize when DHCP changes the computer's DHCP server. I've got a laptop running Mandrake Linux 7.2. Working in the office and I'm running Mozilla (01-06-01 build). Networking (including DNS server) is configured using DHCP. Time to go home, so I suspend my laptop and head out (leaving Mozilla open). When I get home, I plug into my home network, and do a release/renew DHCP to get a home IP address. This changes resolv.conf and sets up DNS to use my home DNS server. I'm able to check email, access telnet/ssh, etc. fine using the home DNS, but Mozilla doesn't seem to recognize that the DNS server has changed, so name lookups fail. I have to quit and restart Mozilla in order to use it.
Looked for a roaming bug but couldnt find one to match this bug so I am going to go ahead and mark it NEW.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee: neeti → darin
what version of glibc are you using? thx! ls /lib/libc*
glibc-2.1.3 is the version. blanders@newvaio:~/tmp/mozilla-01.13.00/package> ls -l /lib/libc* lrwxrwxrwx 1 root root 13 Nov 4 18:33 /lib/libc.so.6 -> libc-2.1.3.so* -rwxr-xr-x 1 root root 910k Oct 4 12:26 /lib/libc-2.1.3.so*
Also reproduced on FreeBSD 4-STABLE with mozilla-0.8 built from source. FreeBSD doesn't use glibc. Must exit and restart mozilla to enable DNS queries. uname -a FreeBSD scott.renfro.org 4.3-BETA FreeBSD 4.3-BETA #5: Sat Mar 10 16:45:03 PST 2001 email@example.com:/usr/obj/usr/src/sys/SCOTT-Z505LS i386 I looked, but didn't see any sethostint()/endhostent() calls, which may have explained the behavior.
Scott: Thank you for pointing me to sethostent/endhostent... I was not familar with these methods. I just assumed that the underlying netdb implementation would have to handle changes to /etc/resolv.conf automatically. But, now with a way to tell DNS to restart, we should be set!
Status: NEW → ASSIGNED
Target Milestone: Future → mozilla0.9
Looks like NSPR doesn't export any equivalent to sethostent/endhostent...
Sorry about the confusion; I meant that sethostent()/endhostent() could *cause* this kind of problem, not resolve it. I've since confirmed that this behavior is a 'feature' of the FreeBSD libc, and apparently glibc as well. Neither stat(2)'s /etc/resolv.conf to detect changes, so the list of nameservers is fixed upon the first resolver call by the application. There is a thread on libc-alpha that discusses this issue and concludes that libc shouldn't punish all callers with a stat(2) on every resolver call. The only options, then, are to a) retain the current behavior (i.e., restart the application on changes to resolv.conf) or b) have mozilla detect resolv.conf changes and trigger a rescan. The initial scan of resolv.conf is performed by res_init(3). It appears that setting unsetting the RES_INIT bit in _res.options, then calling res_init(3) would perform a rescan, and that we could perform this either on a detected change to resolv.conf (or just periodically). I have not yet had a chance to actually test this, however. When I get some time, I'll look at the res_init(3) code and test the rescan behavior, then post an update. We'd also, obviously, have to ensure that any such code was portable or only used where supported.  http://sources.redhat.com/ml/libc-alpha/2001-01/msg00077.html
I think we might solve 99% of peoples' concerns by doing the rescan on a toggle of the offline button.
I verified that simply calling res_init() is sufficient to update the ns address table from resolv.conf. Tested on FreeBSD 4-STABLE, RedHat 6.2 and 7.0, and Solaris 2.7. (On Solaris, must use -lresolv). Of course needs some configure bits as well. Agree that doing this in the Offline->Online transition is sufficient (once you know to toggle it ;-)
Excellent! Thanks for investigating this ;-)
(Too bad we don't have a reliable XP solution for auto-detecting when our network connection goes away and comes back.)
cc'ing gordon and wtc
Gagan: I've read the description and comments in this bug report and I have no comments to add.
->taking over to ease his pain... :-)
Assignee: darin → gagan
Status: ASSIGNED → NEW
from mtg w/gagan: move target milestone to 0.9.2
Target Milestone: mozilla0.9.1 → mozilla0.9.2
mass move, v2. qa to me.
QA Contact: tever → benc
Okay...I'll take it.
Assignee: gagan → gordon
I could use some x-head help on this. I've pasted a first take at the proposed patch below, but I need help verifying it's going to build and run on the various flavors of Unix we target. I also think the typedefs shouldn't be needed, but I wasn't able to locate a header file that defined them. Index: nsDnsService.cpp =================================================================== RCS file: /cvsroot/mozilla/netwerk/dns/src/nsDnsService.cpp,v retrieving revision 1.85 diff -r1.85 nsDnsService.cpp 45a46,53 > #if defined(XP_UNIX) > typedef unsigned long u_long; > typedef unsigned int u_int; > typedef unsigned short u_short; > typedef unsigned char u_char; > #include <resolv.h> > #endif > 928a937,941 > #if defined(XP_UNIX) > _res.options &= ~RES_INIT; > int error = res_init(); > NS_ASSERTION(error == 0, "res_init() failed"); > #endif
Gordon, According to the resolver man page on Solaris 8, you are supposed to include the following headers: #include <sys/types.h> #include <netinet/in.h> #include <arpa/nameser.h> #include <resolv.h> I think to get those typedefs, you just need to include <sys/types.h>. Note that modifying the global variable _res and calling res_init() is not thread-safe. The Solaris man page has these to say: These interfaces are unsafe in multithreaded applications. Unsafe interfaces should be called only from the main thread.
Only one thread can be executing nsDNSService::Init() at a time. Wouldn't that be sufficient? It's highly likely that thread is the main UI thread anyway.
It must be called by the main thread only if libresolv.so was compiled without -D_REENTRANT. It only matters to a couple of global variables like 'errno' that are thread-local storage if compiled with -D_REENTRANT. Without reading the libresolv source code, it is safest to follow what the man page says.
Since this has a potential of breaking some Unix platforms, how about we land this on the trunk and bake and then bring it in for the 0.9.2 branch if needed?
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Is this a fix for bug 63564, by puting a rescan into the toggle for Linux-only? What about other platforms?
Summary: Mozilla does not recognize DNS server changes (DHCP) → Conn: Mozilla does not recognize DNS server changes (DHCP)
*** Bug 88144 has been marked as a duplicate of this bug. ***
i will take a look.
Assignee: gordon → dougt
Status: ASSIGNED → NEW
I suck. back to gordon.
Assignee: dougt → gordon
*** Bug 89501 has been marked as a duplicate of this bug. ***
*** Bug 93000 has been marked as a duplicate of this bug. ***
+ RELNOTE for NS 6.1: " Mozilla does not recognize changes in DNS servers while running. PPP and VPN users should restart the browser after connecting to a different network. "
how about if we tried running this code on DNS failure? we could then retry the DNS lookup once. this way, we'd get closer to solving the problem of detecting changes to /etc/resolv.conf in at least the false error case. how does this sound?
Sounds plausible. How expensive would that be? Probably not much is my guess.
This would solve the case that users find most annoying and would be a very welcome change. It's not too expensive; certainly miniscule compared to the time that the DNS query took to time out. The bind code is in lib/resolv/res_init.c in the function __res_vinit() and just parses the contents of the environment variable LOCALDOMAIN and the contents of your resolv.conf file (typically just a couple of lines) into internal data structures. The general case (including preventing packets from traversing the whole Internet to get to your old nameserver) is much harder to fix and probably involves watching resolv.conf (and knowing where it is to watch it and ...)
*** Bug 95218 has been marked as a duplicate of this bug. ***
*** Bug 90913 has been marked as a duplicate of this bug. ***
+mostfreq - this has been a popular one...
*** Bug 48094 has been marked as a duplicate of this bug. ***
Why not also stat resolv.conf if we're doing a lookup after no lookups have been done for a while (say 5 or 10 minutes)? This would be a miniscule hit, and would catch the common case of somebody changing networks (which usually takes most people longer than 10 minutes), etc, before the first failed lookup (and resulting long timeout period) happens..
Well, speaking for myself, I work on a Notebook, using dhcp. so if I change my location in our Area (wired to 802.11, to the other building etc.) my resolve.conf is rewritten every 3 minutes. Not that there need to be changes, but dhclient or pump or whatever recreates it with the 'new' data it retrieves.
Gordon/Gagan - Where are we on this one? This looks like a nice to have, but not a stop ship. If this is the case, please mark it as nsbranch- for this round.
We don't have a current patch proposal, so I'm changing this to nsbranch-. It shouldn't be too hard to get something that will work, but we'll need more bake time. This should be doable in the 0.9.5 timeframe.
er, maybe we can get to it in 0.9.6.
Target Milestone: mozilla0.9.5 → mozilla0.9.6
I have: 500 mhz Indigo iMac, 384 mb ram, OS 9.2.1 and OSX 10.1, 56K internal modem, CD-RW, Lexmark Z53, Earthlink ISP, dial up service, Mozilla OSX 0.9.4 Build ID: 2001091313 I'm not sure if this is a duplicate or not but Brad Baetz says it is. The problem on inital launch is duplicatable by double clicking Mozilla OSX 0.9.4 with the phone connection in a disconnected state. Mozilla brings the start page URL into the window, it dials the ISP, connects, then sits there trying to resolve the host, and eventually says that it can't find it. The reload button doesn't work at this point. You can click on the URL window and then return and Mozilla will go to the URL site. I have http://start.earthlink.net set as my home page. It doesn't seem to be site dependant however. This can happen on any site that you try to go to from a state of your phone connection being disconnected. I have just downloaded and tested Mozilla OSX 0.9.5 ( Fizzilla? ) and the problem is still there.
This is arguably a performance enhancement, so I'm setting the target for 0.9.7.
Target Milestone: mozilla0.9.6 → mozilla0.9.7
isn't this a duplicate of bug 26718?
*** Bug 117242 has been marked as a duplicate of this bug. ***
DNS change problem still exists in Mozilla (windows version, build ID:2001112009), when reconnecting PPP session from one to another ISP with different DNS server Mozilla must be restarted to solve the problem
*** Bug 117613 has been marked as a duplicate of this bug. ***
*** Bug 117628 has been marked as a duplicate of this bug. ***
Lokms like lots of dupes on this one, nominating for nsbeta1
Removing nsbeta1 nomination because this bug has been plussed.
*** Bug 115603 has been marked as a duplicate of this bug. ***
*** Bug 130505 has been marked as a duplicate of this bug. ***
*** Bug 88144 has been marked as a duplicate of this bug. ***
*** Bug 26718 has been marked as a duplicate of this bug. ***
I change Platform/OS to All/All because bug 26718 was marked so (this bug does occur on Windows and MacOS too)
OS: Linux → All
Hardware: PC → All
if we are turning this bug into an all-plat bug, do we have an idea of how to implement this on each platform? UNIX uses /etc/resolv.conf, I don't know how the Mac or Windows stacks update their resolver lists, or how mechanisms change the setting (DHCP, dialup, etc.)
Hi, I can confirm the bug running FreeBSD 4.5 stable with multiple dialup isps. What about that idea: Whenever a dns lookup fails, /etc/resolv.conf is re-read. This will minimize extra cost for the update and solve the problem entirely (at least to my mind). Regards, Simon
I think we need both the DNS failure -> rescan AND the Offline|Online button solutions. The auto-rescan on DNS failure would not solve problems where the old DNS server is visible, but does not server all domains needed for the new network (VPN users or companies that use shadow DNS domains would have this problem).
Frankly, while a rescan after a failed lookup makes the problem slightly better, this does not, IMO, actually qualify as a "solution" because the initial failure can take a fairly long time to timeout, and unlike other timeouts, one can't even do other web accesses in other windows, etc, because they too require DNS lookups which haven't yet been rescanned so have to timeout also, and so on. If this bug is actually to be considered fixed, we need to address it in such a way that it becomes invisible to most if not all users, and that solution simply doesn't come close. Should we rescan after a DNS failure? Yes, definitely. This should really be a no-brainer.. Should we (continue to) rescan after an offline-online switch? Yes, again, definitely. We can never be assured of doing things perfectly, so there should always be a way for the user to force a rescan, and this seems an obvious way to do it. However, IMO, we should also (as I reccomended many months ago) automatically rescan if we have not done any DNS lookups in a while (a few minutes or so). The extra hit of checking the DNS configuration in this case is miniscule and will not be noticed by anyone, and it would fix most cases of DNS server changes, which are often accompanied by some system idle time (moving a computer around, overnight configuration updates, etc)
I'd be interested in finding out how other applications use res_init. I remembered reading about it in DNS & Bind, 2nd edition (a long time ago), but it did not give any practical discussions about how to use it. Not many programs have worked as an application that persist through multiple network changes. I suppose someone could change the code so it scans everytime, then run a page load test to see how it affects performance.
*** Bug 132970 has been marked as a duplicate of this bug. ***
*** Bug 133359 has been marked as a duplicate of this bug. ***
It seems that on linux debian testing, using the 0.9.9 mozilla version, the workaround (going offline/online) doesn't work anymore. Version 0.9.8 was running OK.
*** Bug 134145 has been marked as a duplicate of this bug. ***
*** Bug 143585 has been marked as a duplicate of this bug. ***
I have fixed this problem. See 117628.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
*** Bug 145953 has been marked as a duplicate of this bug. ***
*** Bug 141654 has been marked as a duplicate of this bug. ***
*** Bug 156661 has been marked as a duplicate of this bug. ***
Verified per comment #69. Please reopen bug 117628 if there is still a problem.
Status: RESOLVED → VERIFIED
QA Contact: benc → junruh
*** Bug 199929 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.