Closed Bug 53967 Opened 24 years ago Closed 21 years ago

"cannot be found" errors w/ washingtonpost.com (and others)

Categories

(Core :: Networking, defect, P4)

x86
All
defect

Tracking

()

RESOLVED DUPLICATE of bug 68796

People

(Reporter: ponyisi, Assigned: gagan)

References

()

Details

(Keywords: verifyme)

If one browses washingtonpost.com enough, dialog boxes pop up saying that
"ad.doubleclick.net cannot be found".  However, nslookup finds it, and for a
while afterwards, Mozilla finds the site as well.  I suspect Mozilla is having
some kind of conflict with the local resolver cache.  (Build 2000091312, glibc
2.1.3)
I've seen this on my Win98 System, too.    If you run Debug -> Browser Buster
long enough, you'll get an error saying that a host could not be found, such as
"random.yahoo.com"

Some sort of TCP/IP leak perhaps??
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
dns->gordon
Assignee: gagan → gordon
*** Bug 53969 has been marked as a duplicate of this bug. ***
I get this on Debian 2.2 when trying to browse http://news.bbc.co.uk/, first
try, every time, using Build ID 2000120708.

This is what a normal (ping news.bbc.co.uk) DNS request looks like:

00:08:45.719969 seth.lan.1050 > ariel.lan.domain:  36243+ A? news.bbc.co.uk.
(32) (ttl 64, id 15702)
00:08:45.992356 ariel.lan.domain > seth.lan.1050:  36243 q: news.bbc.co.uk.
2/2/2 news.bbc.co.uk. CNAME newswww.bbc.net.uk., newswww.bbc.net.uk. A
www3.thny.bbc.co.uk (170) (ttl 64, id 35019)

And this is what Mozilla's DNS request looks like:

00:07:27.243135 seth.lan.1050 > ariel.lan.domain:  45355+ AAAA? news.bbc.co.uk.
(32) (ttl 64, id 15697)
00:07:27.849984 ariel.lan.domain > seth.lan.1050:  45355 NXDomain q:
news.bbc.co.uk. 1/0/0 news.bbc.co.uk. CNAME newswww.bbc.net.uk. (76) (ttl 64, id
35013)
00:07:27.851486 seth.lan.1050 > ariel.lan.domain:  45356+ AAAA?
news.bbc.co.uk.lan. (36) (ttl 64, id 15698)
00:07:27.854231 ariel.lan.domain > seth.lan.1050:  45356 NXDomain* q:
news.bbc.co.uk.lan. 0/1/0 (102) (ttl 64, id 35014)
00:07:27.856575 seth.lan.1050 > ariel.lan.domain:  45357+ A? news.bbc.co.uk.
(32) (ttl 64, id 15699)
00:07:27.859196 ariel.lan.domain > seth.lan.1050:  45357 NXDomain q:
news.bbc.co.uk. 1/0/0 news.bbc.co.uk. CNAME newswww.bbc.net.uk. (64) (ttl 64, id
35015)
00:07:27.859436 seth.lan.1050 > ariel.lan.domain:  45358+ A? news.bbc.co.uk.lan.
(36) (ttl 64, id 15700)
00:07:27.862105 ariel.lan.domain > seth.lan.1050:  45358 NXDomain* q:
news.bbc.co.uk.lan. 0/1/0 (102) (ttl 64, id 35016)

It seems like the DNS code is getting confused by the CNAME, however this
obviously does not affect all CNAMEs (I tried creating a virtualhost with a
CNAME and it worked fine.)

The additional NXDomain in response 45357 is interesting.
*** Bug 65924 has been marked as a duplicate of this bug. ***
*** Bug 65924 has been marked as a duplicate of this bug. ***
Target Milestone: --- → mozilla0.9.1
Whiteboard: [DNS]
I couldn't reproduce this with my latest Linux build.

Ben, can you try to reproduce this on your win98 build, and update the bug with 
your results.  Thanks.
Target Milestone: mozilla0.9.1 → mozilla0.9.2
qa to me.

Gordon is probably interested in the Win98 + buster report, so I'll update that
first:

Running buster over the last few days revealed several problems with buster. One
of them was that the buster list included some hostnames that don't exist
anymore. They produced DNS errors which I validated as correct. They have been
removed from buster now, so new DNS errors are probably valid. I am running
buster now and will report anything I find here.

I'll look at the washpost and bbc problems later.

QA Contact: tever → benc
Keywords: qawanted
Still happening with build 2001052411, when I go to http://news.bbc.co.uk/.
Can't get any dialogs to pop up on http://www.washingtonpost.com/ but then none
of the main banner ads come up either, and Mozilla spends an awful long time
loading the page.

Another URL to try which will bring up several dialogs with
"ad.nz.doubleclick.net cannot be found" is http://www.stuff.co.nz/

(Note that all this only applies if you are not using a proxy.)
Well, it doesn't lool like it's DNS that's failing.  Even with a DNS cache (on
the DNS_BRANCH) I was eventually able to get pageloads to start failing
(NS_ERROR_FAILURE - big help) by continuing to browse www.washingtonpost.com.

The next thing to check is to see if the HTTP connection is failing somehow.
Summary: DNS fails when it shouldn't → URL fails to load when it should
Whiteboard: [DNS]
taking over for investigation
Assignee: gordon → gagan
changing summary.

If we are going to persue this, can we analyze each site in a differnt bug?

I'd like to move the bbc problems somewhere, esp since I have questions about
the log provided by Martin (and why mozilla kicks off "AAAA?" requests).
Keywords: qawanted
Summary: URL fails to load when it should → "cannot be found" errors w/ washingtonpost.com (and others)
Hmmm... after some investigation and having tried both washingtonpost and bbc
for quite a while I am still unable to reproduce this on any of the platforms. I
am reducing the priority of this bug for this reason. If someone can help benc
provide a consistently reproducible case then we can try addressing it.
Priority: P3 → P4
Two comments.

First, this happens to me occasionally with Netscape on RedHat Linux 6.2; it
seems to happen with about the same frequency as with Mozilla.

Second, I wanted to mention for the record (although I assume everybody involved
with this bug knows) that the AAAA queries are asking for IPv6 addresses, which
is probably what's messing things up.  If anybody knows of code that involves
IPv6 addresses, that's probably the trouble spot.
I think this is a DNS query performance problem of some kind, and that we are on
the right track. I lack practical experience here, but this is what I am thinking:

1- The fact we kick off IPv6 address queries does result in a differnt response.
This suggests that the resolver code we use (and it's timeout and retry
parameters might be different).

2- Really slowly hosted domains AND/OR really crummy first line DNS servers.

I know my PacBell DNS servers have really horrible performance, they even time
out to www.news.com and the domains we have discussed here. Ususally, if you
wait 30 seconds, and try again, it works, so I think the first query is failing
but priming the cache...

That means this could really be a pain to isolate, I need to configure my own
DNS server, but haven't yet.

Any othe ideas?
off to 1.0 till I can get more reproducible errors on this. 
Target Milestone: mozilla0.9.2 → mozilla1.0
Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1 
(you can query for this string to delete spam or retrieve the list of bugs I've 
moved)
Target Milestone: mozilla1.0 → mozilla1.0.1
I see this all the time at www.nyt.com, www.wsj.com and www.washingtonpost.com.
I see this all the time at www.nyt.com, www.wsj.com and www.washingtonpost.com.

I'm using build 20020222503.  

This bug may not seem to be a big deal, but it gives the browser the "feeling"
of not loading pages properly or even able to finish at all.  
Be sure that you don't run 2 Mozilla instances...
I have seen this, when I have a firewall like Zone Alarm turned on.
Could it be related ?.
Is it being done on purpose ? (Block third party sites, turned on in Zone Alarm)
The UK problem might be related to IPV6 problems I don't understand well enough
to explain here. Otherwise, this bug is old enough where it should probably be
WFM and put to rest.
Whiteboard: dupeme
I get this behaviour frequently under Mac OS X (1.2 Build ID 2002101612) with
the more high-traffic sites. E.g. I get it repeatedly when trying to access
ftp.mozilla.org (and we _know_ that exists :0) and just now got it on my first
attempt at www.usps.gov.

I believe Mozila is erroneously treating a site "Busy" status as "site
non-existent" 
What DNS result are you specifically thinking happens? A busy DNS server would
timeout, which is the same as it being down. We have some discussion of that in
bug 164715, but it is not going well.

I'm going to dupe this to bug 164715, because that is where a discussion of
handling DNS errors is sort of happening.

This bug has too many reproducible problems, and not enough nslookup analysis
for each case. I'm going to freely admit here that my DNS technology is about 5
years old (DNS & BIND, 2nd edition). My manager has ordered the 4th edition, so
I need to read it and re-write the DNS tests to reflect more OS resolvers and
newer techhnology.

If anything is breaking for people, please move them to separate bugs, and we'll
hoe through them there.

Martin did a great job w/ the IPv6 scenario, but if that still happens, we'll
still need to move it to a new bug.

*** This bug has been marked as a duplicate of 164715 ***
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
Whiteboard: dupeme
Ben, this is NOT a duplicate of bug 164715.  See my comment 10 above.  If this
occurs with cached DNS entries (either by nsDnsService or nsSocketTransport)
then it isn't a problem with DNS failing.  It could be a problem open new
connections with the server.

Vicki, when you have problems with ftp.mozilla.org, are you access it via ftp:
or http: ?
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Gordon: you said you reproduced the problem in #10, did you get the same error
message? Does that mean Vicki's interpretation in #23 is correct, that we are
getting DNS errrors for non-DNS problems?
removed milestone target.

I'm not convinced this problem has been solved or what the cause was. We have a
lot of DNS bugs that seem to be un-reproducible right now. Both sites seem to be
mentioned in bugs on a periodic basis.

I could imagine this would be a temporary remove DNS failure (Bug 65924).
Target Milestone: mozilla1.0.1 → ---
I'm seeing a similar problem, although I'm not sure it's DNS related. When
browsing washingtonpost.com, I often come to a place where Mozilla is trying to
load a page, but a page never renders, and "Transferring data from
ad.doubleclick.com" stays at the bottom. This occurs regularaly, but I cannot
reproduce it at will.

This is on windows 2k, on versions 1.2b and 1.3a, at least.
I should add that it mostly happens on this URL:
http://www.washingtonpost.com/wp-dyn/print/
WRT comment #28, Version 1.3b seems to be generally much crisper in this regard,
but I still see this.  The only case of this hanging "Transfering data from
ad.doubleclick.net..." I've seen has been on a page that was having a pop-up
blocked:  

http://www.washingtonpost.com/wp-dyn/articles/A12818-2003Mar11.html

The "Transfering data from message stayed for a long time (I think I let it sit
for 15 minutes), but as soon as I opened a new window in another tab, it went away.

WRT the original problem definition, "ad.doubleclick.net cannot be found", I
have not seen that type of behavior in quite some time.

Is anyone else seeing the "ad.doubleclick.net cannot be found" dialog?
Also, just noticed that at http://www.daybydaycartoon.com/ I got the hanging
Tranfering data from, but this time it was "Transfering data from
ads.zdnet.com.com".  Could something be appending an extra .com under some
circumstances and that's why they never connect.

Forgot to mention that comment #30 refers to Mozilla 1.3b.  This one was seen
with 1.0.2.

Sorry for two e-mails with so small an interval.
RESOLVED/fixed:

This was probably some combination of DNS cache flakiness + the DNS hang over
TCP transfer problems.

Darin has probably fixed all of the problems.

New problems to new bugs, including the 1.3b problem Scott described in the last
two comments, please.

Status: REOPENED → RESOLVED
Closed: 22 years ago21 years ago
Depends on: 205726
Keywords: verifyme
Resolution: --- → FIXED
This should really have been resolved as WORKSFORME unless you can specify the
patches / bugs that actually fixed things.
Status: RESOLVED → REOPENED
No longer depends on: 205726
Resolution: FIXED → ---
This is what happens when you get behind on bugmail.

*** This bug has been marked as a duplicate of 68796 ***
Status: REOPENED → RESOLVED
Closed: 21 years ago21 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.