Closed Bug 146769 Opened 22 years ago Closed 21 years ago

DNS: gets stuck resolving host

Categories

(Core :: Networking, defect)

defect
Not set
major

Tracking

()

VERIFIED WORKSFORME

People

(Reporter: swordedge, Assigned: gordon)

References

()

Details

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.0rc2) Gecko/20020510
BuildID:    2002051021

Basically, Mozilla is looking up the IP address of a web site and gets stuck
doing that.  When this happens, I can use nslookup on a command line and get a
response back before I get my finger off the return key.  So something is
mozilla is on occasion, getting stuck with this operation.  This problem is
intermitant.  I have seen delete the history.dat file as an solution but that
doesn't work.  Nor is it a good solution.  The address works fine again after
the original operation times out in something like four minutes, way too long.



Reproducible: Sometimes
Steps to Reproduce:
1.go to web page, any
2.
3.

Actual Results:  intermittant. can't reliably reproduce it.

Expected Results:  When happens, starts to retrieve page, not say resolving host
for many minutes.

I  have seen this on windows and os/2.  I consider this severe as it looks real
real bad when mozilla can't find an IP address.  A major Mozilla impression hit.
-> Networking
Assignee: Matti → new-network-bugs
Component: Browser-General → Networking
QA Contact: imajes-qa → benc
This could be a DNS or a more general networking bottleneck. When this happens,
can you open a new window and go someplace else via ip address?

Can you give some specific steps that produce this problem?
I didn't realize the email address was bogus...



If I open a command line and do an nslookup, I get the IP address
immediately, before I can get my pinky off the enter key.  If I cut and
paste that address that nslookup found into the URL line of a new browser
window, the browser goes their pronto.  This is definately some sort of
mozilla DNS lookup problem.  I have seen this behavior on both OS/2 and
windows so it should be in common code.

As for specific steps, good question.  The proble is very intermittant. 
Sometimes I will see it several times a day.  Then I might not see it for
several days.  I do not know of a specific sequence of events that makes
the problem occur.

If I leave the browser to time out (what, four or five minutes?), it puts
up the can't find URL error window.  If I then go right back to the same
URL, it usually goes immediately to it.  When the problem is occuring, I
can open a new browser instance and plug in any URL I want.  If that URL
is anything other then an IP address, it gets stuck too.  If I click on a
link on the site that has any URL in it (absolute links), including links
to itself, it gets stuck during instances when the problem is occuring. 
It looks like mozilla does not remember any IP address that it has
previously seen during this session (they arn't likely to change while you
have the browser open!)




More info....




New data...

It happened again.  So I nslooked up the site.  Opened a new browser
window. Used the IP address and went to the site.  Worked great (no
absolute links).  About five mins later, up pops a window asking me to
save the file that I clicked on before the DNS problem occured.  I
successfully saved file.  Went to the tab that was halted.  It had the
page on it perfectly.  It just took it something like five minutes to
figure out where the page was located.  No error message about not finding
page, they loaded instead.


This bug seems to have something to do with opening a link in a new window. 
When this occurs, current downloads stop, no page in any window can find a host,
untill it clears in roughly five minutes.
I have noticed exactly the same problem on Linux (MZ build 2002060904),
except that it does not take 5 minutes for Mozilla to resolve the host
(rather 1 minute or so) and that it seems to have no link with
"opening a link in new window".
Tcpdumps suggest that the resolutions that mozilla is trying aren't the ones
that the status bar is complaining about. Is resolution in mozilla
single-threaded such that one stuck resolution can block the entire queue? I'm
starting to think the problem is bad inline images on a page, not the page itself...
I believe this problem has been seen and reported to DNS before...
Whiteboard: dupeme
I'm seeing a similar problem to the one described here. I'm not sure if there's
another bug open for it; I didn't see one.

There seems to be a small window of time (in all likelyhood, during the DNS
resolution) in which hitting "stop" for a page load will make all Mozilla DNS
lookups block for some time. For example, I just had the following happen:

1. Type http://www.cnn.com/ in the location bar and hit enter
2. Almost immediately hit "stop"
3. All subsequent attempted page loads which required new (non-cached) DNS
information hung for a couple of minutes. Then everything went back to normal.
Additional page loads from sites I'd already been at (thus, DNS info was
probably cached) worked fine. nslookup on the hostnames for the blocked load
attempts also worked fine.

I see this a couple of times a week on average. Currently I'm using build
2002082608 on FreeBSD, but I've seen this (or similar) problems on builds for a
long time and also on MacOS X.

Is there a more appropriate bug for this issue?
Sean: please file a new bug on your issue. I have seen this myself, but not
given it enough attention to file a bug.
Done: Bug 164988
I'm interested in trying to understand this better. Sean filed a bug about
problems w/ stop not unlocking the DNS service. Can we steps for the "Open New
Window" problem here?
Summary: gets stuck resolving host → DNS: gets stuck resolving host
I ran into this problem only after upgrading from Mozilla 1.1B to 1.1 on 
Windows 2000.  It will NOT resolve any host address and after 30 seconds 
displays an alert that the connection was refused.  All subsequent attempts are 
immediately met with that same alert.  IP addresses produce the same result.  
IE 5.5 launches and resolves the URL almost instantly.
I also have this problem.  I had it with mozilla 1.0rc3 on FreeBSD 4.6.2 and
still have it with mozilla 1.1 on FreeBSD 4.7.  My work around is to quit then
restart mozilla.
gordon, can you look at this.
Assignee: new-network-bugs → gordon
I have also experienced this or a similar bug on Mozilla 1.0 running on
Slackware Linux 8.1. Galeon 1.2.5 also similarly affected. But this problem only
seems to have developed recently for me...so that could possibly be a clue that
some configuration change could be affecting this...or maybe not. :)
I experience this as well .. several times a day at the moment.

All is well if I restart the browser, or a new mozilla process or if I submit
the query in opera.

Its d*** frustrating and eats cpu.

I run Mozilla 1.2B on a linux 2.2.18 machine.
I should have mentioned that when this bug occurs I can refresh existing pages
just fine, but cannot access other pages not currently open.
I also see this.

Debian unstable, kernel 2.4.19

I see it with Mozilla 1.1 and with 2002111218 (mozilla-snapshot package).

It is *extremely* annoying, and seems quite unnecessary: the problematic host
name is always (instantly) resolvable with "host" or "nslookup", but mozilla
hangs for a minute. No pages can be opened at all, until the first one is found
(then all I have tried to open come at once...)

I also note that File->Exit leaves the Mozilla process around (but the window
disappears). This means I cannot run mozilla again (the Debian scripts try to
remote-control existing instances first). So I first must kill all mozilla
processes, then re-run.

Seems therefore that something very synchronous is happening... can't even be
exited!

Anyway, I see this as a severe bug, as it affects me almost daily!

/Mikael
When I get this bug, which after all this time I still have and still can't
force to happen, I kill mozilla.  I can't start a new one till I run a process
killer called watchcat.  According to it, mozilla is still running, 99% CPU
usage and running a process called BufferCreator.

Kill that and I can start mozilla again.  What ever BufferCreator is doing, it
needs fixed.
I'm seeing this on 1.2.1 under MacOS 9.2.2

For me it seems to happen when I leave the browser running for a few days.  I
open a new window and try to go to google (for whatever reason that's the only
site this happens to me with).  I type a few letters into the url box in the nav
bar at the top and hit enter.  The little info thing at the bottom of the window
just sits there with "resolving host" for minutes and minutes.  I've never seen
it even time out after a long time.  Opening a new window won't help either.  I
can go to other new sites (eg hotbot worked for me all weekend while this
problem was happenning).  I ran Internet Explorer to see if the problem was
something other than Mozilla, and it loaded google file.  Jumped over to Mozilla
thinking, oh maybe google's working again, but no, on Mozilla it still wouldn't
load.  Quitting the browser fixed the problem, though it took me three days
before I asked someone and they suggested it.
I seemed to have managed to get round this by creating a completely new profile
with the same properties as the old. 

When I use this new profile the problem has never occured ! whereas with the
previous profiles the problem constantly occurs.
can you preserve the old profile?
Same goes for me, never had a Mozilla browser that actually has *ever* worked 
on my WIN98 SE computer, and this latest version that is supposed to be stable 
is yet another. Heard lots of great things about it, but worthless if it 
doesn't work. I've got 64MB RAM, lots of HD space and nothing on the install 
appears to be out of place.
mozilla 1.2.1, redhat linux 9.
I see this after mozilla has been running for about two or three days.
For no apparent reason it gets stuck in the 'Resolving host ...' stage when a
new site is accessed. Existing pages can be refreshed OK. URLs with numeric
addresses work OK. Other browsers and DNS lookups are fine when this is
happening. Exiting mozilla leaves a couple of processes hanging around which
have to be killed manually before mozilla can be restarted. After that, it all
works fine for another two days or so.

The problem seems to affect mozilla mail too. While its happening you can't send
email messages - gets stuck contacting the SMTP server.

When this is happening, strace of the the main (?) mozilla thread shows this...

read(3, "\372", 1)                      = 1
gettimeofday({1060236565, 665000}, NULL) = 0
ioctl(5, FIONREAD, [0])                 = 0
poll([{fd=5, events=POLLIN}, {fd=12, events=POLLIN}, {fd=8, events=POLLIN},
{fd=3, events=POLLIN}], 
4, 0) = 0
gettimeofday({1060236565, 665705}, NULL) = 0
gettimeofday({1060236565, 665852}, NULL) = 0
write(5, "5\30\4\0\375Dl\2H\0\0\0\21\0\21\0;\3\5\0\302@l\2\0\0\0"..., 848) = 848
ioctl(5, FIONREAD, [0])                 = 0
poll([{fd=5, events=POLLIN}, {fd=12, events=POLLIN}, {fd=8, events=POLLIN},
{fd=3, events=POLLIN, re
vents=POLLIN}], 4, -1) = 1
gettimeofday({1060236565, 714604}, NULL) = 0
gettimeofday({1060236565, 714724}, NULL) = 0
gettimeofday({1060236565, 714768}, NULL) = 0
gettimeofday({1060236565, 714940}, NULL) = 0
read(3, "\372", 1)                      = 1
... looping forever, with pretty much no delay in the poll, read or write as far
as I can tell. I've let it run for 1/2 an hour and it remains stuck until you
hit the stop button.

The FDs shown by lsof are...
mozilla-b 22911 djh900    3r  FIFO        0,5            847808 pipe
mozilla-b 22911 djh900    4w  FIFO        0,5            847808 pipe
mozilla-b 22911 djh900    5u  unix 0xc239aac0            847809 socket

There are a couple of other threads/processes too, one of which (IIRC) is stuck
in a futex().

I'm pretty sure tcpdump shows a flurry of packets at the start of all this
(after entering the URL), but shows no traffic during the rest of the 'stuck' state.

My ~/.mozilla was initially created by mozilla 1.0x.
FWIW, nscd is not running and resolv.conf is...
search anu.edu.au
nameserver  130.56.4.1
nameserver 150.203.1.10
nameserver 150.203.22.28
nameserver 150.203.35.2

Thanks.


Just to add one more experience to the list...

Mozilla 1.5a
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5a) Gecko/20030718

on RedHat 9 

I see the (what appears to be the...) same problem -- Mozilla decides (sometimes...) to take a vacation with "Resolving host www.foo.com..." at the bottom of the window. Dig and host both answer IPv6 and IPv4 queries for www.foo.com just fine while Mozilla is stuck on 'planet mozilla'...

Best wishes...
I've had this problem happen to me with 1.2.1, 1.4, and now 1.5b on RH9. Exact
same symptoms, and exiting normally leaves behind a mozilla-bin process that has
to be SIGKILLed.

Why hasn't this bug been assigned? It's a show-stopper when it happens. 
FWIW this is also a problem in Firebird 0.6.1 and 0.7 on RH9. Stop, kill -9 pid,
restart, sigh.
i suspect this has been fixed now that the DNS rewrite landed (bug 205726). 
please test against a trunk build.  0.7 is based on mozilla 1.5 which does not
include the DNS rewrite.

marking WORKSFORME
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → WORKSFORME
VERIFIED/WFM:
DNS cleanup, since rewrite has been 1.6a->1.6f.

Regressions or new problems need new bugs.
Blocks: 205726
Status: RESOLVED → VERIFIED
Whiteboard: dupeme
I still see this in 1.7.3.  Bug 260832 seems to describe the same problem, so if
benc wants to throw away the info accumulated in this bug, people still seeing
the problem might want to head over that way.
Akkana, but a reference back to this bug if you like, but if this is a
persistent problem after a re-write, why use the same bug? The first report here
was for pre-1.0 on OS/2.
You need to log in before you can comment on or make changes to this bug.