Closed Bug 214625 Opened 21 years ago Closed 8 years ago

DNS: long lines in /etc/hosts breaks all dns lookups

Categories

(Core :: Networking: DNS, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: jamie, Unassigned)

References

()

Details

(Whiteboard: [exterminationweek])

Attachments

(1 file, 2 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5a) Gecko/20030718
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5a) Gecko/20030718

If the /etc/hosts file contains a line of >= 998 bytes (they might even be
spaces), mozilla will not be able to make any DNS lookups (even if the host.conf
file is set to 'order bind,hosts'. Mozilla shows an alert popup saying '<url>
could not be found. Please check the name and try again' where <url> is the url
you are trying. 

Reproducible: Always

Steps to Reproduce:
1. Create a line with >=998 bytes of non-end of line characters
2. Fire up your browser
3. You can't go anywhere today. ;)

Actual Results:  
Mozilla shows an alert popup saying '<url> could not be found. Please check the
name and try again' where <url> is the url you are trying. 

Expected Results:  
A failed parsing of the hosts file, but a successful fallback to DNS

At first I was wondering if this was some kind of a glitch in libresolv, but all
other applications (lynx, perl's gethostbyname, ping, etc) work fine.
Mozilla does not in fact parse the hosts file itself; it just uses
gethostbyname2 and friends... 
I can confirm this, at least with glibc 2.2.5.

I said:
 perl -e 'print "127.0.0.1", " "x500, "long.example", " "x500, \
 "long\n";' >>/etc/hosts

and, when I started mozilla, it said it couldn't find the proxy server I 
listed. That's wrong since it's squid and it's on my machine and it's 
running. Removing the line fixes it.

As the reporter says, lynx and a simple perl script work. I have a 
simple program that uses gethostbyname2 and it works as well. Strace 
doesn't seem to produce useful information but the resolver code is a 
mess so that's not surprising.

I'm going to confirm this bug.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OK, necko calls PR_GetIPNodeByName passing a 1024 byte buffer. On Linux 
PR_GetIPNodeByName calls gethostname2_r which seems to be failing 
because the buffer is too small; gethostname2_r is documented as 
returning ERANGE for this situation. I'm not sure if this propogates 
back to necko. In any case, necko just gives up. It should use a larger 
buffer if necessary.

At least this is what I think is happening.
darin likes to reminmd me that mozilla doesn't support DNS per say, it just
accesses an API.

Since I'm not the greatest whitebox tester, I haven't had much time to do much
work on testing our interaction w/ the resolver APIs.

I guess this means I need to expand my thinking beyond just limits to valid
domains (I was working on a test for that just today), and possibly include
other ranges of returned values.
Summary: long lines in /etc/hosts breaks all dns lookups → DNS: long lines in /etc/hosts breaks all dns lookups
I filed bug 214752 against NSPR because PR_GetIPNodeByName does not provide a
way to detect a "buffer too small" error. Once that's fixed it should be
straight forward to add a test for this case and resize the buffer.
please note that mozilla will shortly be calling getaddrinfo on most platforms.
see bug 205726 for details.
Depends on: 205726
Jamie: can you test a more recent build and update us?
*** Bug 228422 has been marked as a duplicate of this bug. ***
May well be a glibc bug. If I add a 4200+ character line to /etc/hosts then 
getaddrinfo from glibc 2.2.5 segfaults but glibc 2.3.2 works.
Is there a legal limit to a hostname in /etc/hosts?

My Mac says the following for sethostname().

Host names are limited to MAXHOSTNAMELEN (from <sys/param.h>) characters,
     currently 256.  This includes the trailing NUL

I think DNS has a limit of 252 characters...
bug is still present with firefox 1.0.7 : see
http://qa.mandriva.com/show_bug.cgi?id=18183

This bug still exists on Fx 1.5.0.1. For me failed at lite longer than 1004 characters.
(In reply to comment #12)
> This bug still exists on Fx 1.5.0.1. For me failed at lite longer than 1004
> characters.
> 

I confirm. A 832 bytes-long line broke my head and FF 1.5.0.1 for three days until I found this bug report. After a little cleanup in my /etc/hosts aliases,  everything works fine again.

I'm using GNU/Linux 2.6.16, glibc 2.3.6 (also tested with 2.4.0).
Should I commit a bug report to the glibc maintainers?

The strange thing is that, as Darin pointed out, every other application worked fine (Lynx/Links, Opera, wget, ncftp, ping...)

Regards,

--Jan
Assignee: darin → nobody
QA Contact: benc → networking
Happened to me too. Broke all gecko based browsers on my system (Epiphany, Firefox 64 bit and firefox 32 bit).

I tried removing my .mozilla, upgrading firefox, re-compiling, etc. till I found this took me 1.5 hours!

It'd be nice to get it fixed so others don't have this prob.
Still exists in 2.0.0.8 and 2.0.0.9

I was even considering changing to konqueror! :P
Still exists in Firefox 3.6.3 on linux 2.6.30.8. This is a real skull-corker all right. 

Konqueror, Opera, lynx, ping, etc all work just fine. Firefox is dead in the water. Running wireshark to look for DNS packets, shows zero DNS traffic from Firefox. Even hosts in /etc/hosts (defined on short and long lines) fail to resolve. IP addresses however work just fine. 

7 year old bug. gotta love free software!
Whiteboard: [good first bug]
Assignee: nobody → lusian
Attached patch patch, 0 (obsolete) — Splinter Review
I couldn't reproduce the bug on my Ubuntu 10.04 machine, but I found /nsprpub/pr/test/gethost was failing when /etc/hosts had a long line (> 1000).  With this patch, gethost passes.
Attachment #446201 - Flags: review?
Attachment #446201 - Flags: review? → review?(wtc)
Comment on attachment 446201 [details] [diff] [review]
patch, 0

The approach in this patch isn't going to work.  If you read the docs for PR_GetHostByName (and look at the impl), you'll see that the buffer passed in by the user needs to exist until after the function call (it contains data pointed to by the hostent struct).  So if we try to realloc the buffer, as you do, we either lose the data (as your patch does), or we'd wind up trying to copy it into the user's buffer, which is too small.  So this needs to get fixed by changing the caller of PR_GetHostByName.

I found some bugs in your impl before I realized the overall problem with the approach.  They're below in case you're curious.

--


> 
>+PR_IMPLEMENT(PRStatus)
>+PR_gethostbyname_r(const char *name, struct hostent *ret, char *buf,
>+                    size_t buflen, struct hostent **result, int *h_errnop)
>+{
>+    char *localbuf = (char *)PR_Malloc(buflen);

Given that the common case will be that the user's passed-in "buf" will be long enough, let's use it the first time, and only perform an allocation if it's too small (memory allocator calls can be expensive).  This will make the logic here a little more complex (not too bad, though: just set localbuf=buf, then in the while(ERANGE) loop, do localbuf==buf ? alloc: realloc.  And only free if localbuf!=buf).

>+    if (NULL == localbuf)
>+    {
>+        PR_SetError(PR_OUT_OF_MEMORY_ERROR, 0);
>+        return PR_FAILURE;
>+    }
>+
>+    while(ERANGE ==
>+          gethostbyname_r(name, ret, localbuf, buflen, result, h_errnop))


The man page for gethostbyname_r just says that it returns "non-zero" on failure.  The error code you want to check for ERANGE is stored in "*h_errnop".  Since h_errnop is coming from the user (and could conceivably be NULL), I'd recommend creating "int err" in the function and using that (you'll need to do "if(h_errnop) {*h_errnop = err;}" of course, whether the call succeeds or fails).  So you may have to turn this into a while(1) with a break for when the function succeeds or fails.

>+    {
>+        buflen *= 2;
>+        localbuf = (char *)PR_Realloc(localbuf, buflen);
>+        if (NULL == localbuf)
>+        {
>+            PR_SetError(PR_OUT_OF_MEMORY_ERROR, 0);
>+            return PR_FAILURE;

This leaks memory: realloc does not free the input ptr if allocating the new block fails.  You'll need to assign the result to a different ptr than localbuf, so you don't lose the address of the old block, and then free(localptr) before returning.

>+        }
>+    }
>+
>+    free(localbuf);
>+    return PR_SUCCESS;

Hmm, you're returning success even if gethostbyname_r failed for some reason other than ERANGE.  Fix.
Attachment #446201 - Flags: review?(wtc) → review-
Attached patch incomplete; feedback needed (obsolete) — Splinter Review
PR_GetHostByName, PR_GetHostByAddr, and PR_GetIPNodeByName return:
  PR_ERANGE when the buffer is too small
  PR_DIRECTORY_LOOKUP_ERROR for other cases

Glibc2 functions return ERANGE.  When I tested, gethostbyname_r returned ERANGE and h_errnop was set to -1.
Attachment #446201 - Attachment is obsolete: true
Comment on attachment 446703 [details] [diff] [review]
incomplete; feedback needed

Ok.  I apologize for  spamming because I did not take enough time to look into this.

My previous attempt to fix PR_GetHostByName, PR_GetHostByName, PR_GetIPNodeByName and PR_GetHostByAddr was wrong because none of them gets used.  As mentioned in comment 6, Firefox calls PR_GetAddrInfoByName, which calls getaddrinfo.

getaddrinfo works fine regardless of the size of /etc/hosts at least on my machine.

I need more info here, otherwise this is WORKSFORME.  Brin, you didn't attach /etc/hosts in your mail.
Attachment #446703 - Attachment is obsolete: true
I am sorry to misspelling your name, Bryn.  Can you at least tell me the size of your /etc/hosts?
I still can't reproduce this.  What is the version of your glibc?
As requested, my glibc version:  glibc-2.5-42.el5_4.3
I couldn't reproduce this even on CentOS 5.4.  Firefox 3.0.12 & 3.6.3 work fine.  My /etc/hosts contains a 110k+-long line and a 5k-long line.  Please ask help from CentOS.
Reproduced on Gentoo, with a 990-char long line of "#".

Firefox-3.6.12 and Thunderbird-3.1.6 still loose network connectivity with overly long line in /etc/hosts.

A traffic capture shows 0 network packets.

Portage 2.1.9.24 (default/linux/amd64/10.0/desktop, gcc-4.4.4, glibc-2.11.2-r3, 2.6.35-gentoo-r10 x86_64)
=================================================================
System uname: Linux-2.6.35-gentoo-r10-x86_64-Intel-R-_Core-TM-2_Quad_CPU_Q9550_@_2.83GHz-with-gentoo-1.12.14
Timestamp of tree: Thu, 09 Dec 2010 11:00:20 +0000
app-shells/bash:     4.1_p7
dev-java/java-config: 2.1.11-r1
dev-lang/python:     2.6.5-r3, 3.1.2-r4
dev-util/cmake:      2.8.1-r2
sys-apps/baselayout: 1.12.14-r1
sys-apps/sandbox:    2.3-r1
sys-devel/autoconf:  2.13, 2.65-r1
sys-devel/automake:  1.9.6-r3, 1.10.3, 1.11.1
sys-devel/binutils:  2.20.1-r1
sys-devel/gcc:       4.4.4-r2
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.10
sys-devel/make:      3.81-r2
virtual/os-headers:  2.6.36.1 (sys-kernel/linux-headers)
ACCEPT_KEYWORDS="amd64"

CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe"
I still can't reproduce this.  This might be the cause: http://sources.redhat.com/bugzilla/show_bug.cgi?id=10484

Can you run the program between "--- 8< ---"s with the long line?  Does it run fine?
The program runs just fine. I guess full strace would be helpful.
Hi,

I'm using firefox 8.0 on a Debian Lenny OS and I am affected by this bug. (Or at least, it's really the exact same symptoms)

The programm listed here :
http://sources.redhat.com/bugzilla/show_bug.cgi?id=10484
runs fine on my setup

Every other programms I have tested do run fine.

But firefox isn't able to resolv hosts as long as a line in /etc/hosts has more than 998 chars

I added a long commented line anywhere in the /etc/hosts file to trigger the bug.

Knowing that this bug might have been around for 8 years led me to think it wasn't that easy to track down and might be caused by (several) other factors.

I tried on a virtualbox machine of ubuntu 11.04 with firefox 6.0 and on a Debian squeeze with FF 3.5 but it doesn't trigger the problem.

My findings so far are that : It's not related to firefox's version, 32 or 64bits and user's profile.
I can only reproduce the bug on two debian lenny systems.
Library ? configuration file somewhere ?

Hope that help
I do not have this error even with the record in /etc/hosts longer than 1024.
Have firefox-10.0.4, glibc-2.14.1-r3 on amd64.
Assignee: lusian → nobody
Whiteboard: [good first bug] → [good first bug][exterminationweek]
Can anyone confirm this after 2012?

Command:
perl -e'print "127.0.0.1 ", "x"x2024, "\n"' >> /etc/hosts

does not give any problems on my machine. Archlinux x64

Firefox: 33.1-1
glibc: 2.20-2
3 years later, I'm now using Debian Wheezy and 
iceweasel 31.2.0esr-2~deb7u1
glibc : 2.13-38

I can't reproduce this bug anymore
I'm going to mark this WFM considering the last few comments.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Whiteboard: [good first bug][exterminationweek] → [exterminationweek]
I'm using Debian stretch and Firefox ESR 45.4.0 with glibc 2.24-3.

When using 
  perl -e'print "127.0.0.1 ", "x"x2024, "\n"' >> /etc/hosts
(and my own /etc/hosts) I still got the problem.
Reopening per comment #34.
Status: RESOLVED → REOPENED
Component: Networking → Networking: DNS
Resolution: WORKSFORME → ---
Hey Gijs - I'm going to leave this one WFM and have 1312196 open for the issue.. we only need one and this one has been closed for 2 years. They reference each other anyhow..
Status: REOPENED → RESOLVED
Closed: 10 years ago8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: