Closed Bug 304904 Opened 19 years ago Closed 19 years ago

Necko should refuse to look up invalid hostnames containing "%"

Categories

(Core :: Networking, defect)

defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla1.8beta5

People

(Reporter: jruderman, Assigned: darin.moz)

References

Details

(Keywords: csectype-spoof, fixed1.8, sec-low)

Attachments

(4 files, 11 obsolete files)

6.70 KB, patch
Details | Diff | Splinter Review
30.80 KB, patch
Details | Diff | Splinter Review
15.21 KB, patch
Details | Diff | Splinter Review
9.29 KB, patch
Biesinger
: review+
darin.moz
: superreview+
Details | Diff | Splinter Review
Firefox should refuse to look up hostnames containing "%", so that it can be
consistently secure against spoofing across platforms. See bug 246804 comment 14.

This can be accomplished with either a whitelist or a blacklist:
- Blacklist "%" and "\".
- Whitelist the legal characters and "_". (Maybe enforce syntax too, but that's
bug 140379.)
This is necessary to prevent spoofing, especially when Firefox is used with mail
clients that don't have a fix for bug 304905.
Whiteboard: [sg:fix]
Flags: blocking1.8b4?
This could be done in nsAuthURLParser::ParseUserInfo in
mozilla/netwerk/base/src/nsURLParsers.cpp
Andreas, do you think you can help us with a patch on this? We'd like to do this
but Darin's out on vacation. 
Flags: blocking1.8b4? → blocking1.8b4+
This may take a few hours, I first have to get a current tree and build it ...
Awesome! Thanks, Andreas, for looking into this.
Assignee: darin → andreas.otte
This needs testing against a lot of urls, especially IDN-urls
So far I found no problem ... URLs with % in the hostname however get a nice
little alert box saying "The URL is not valid an can not be loaded."

Attachment #192981 - Flags: review?(bzbarsky)
Attachment #192981 - Flags: superreview?(bzbarsky)
Attachment #192981 - Flags: review?(bzbarsky)
Attachment #192981 - Flags: review?(benjamin)
However what is done with the patch violates RFC 3986 (3.2.2):

   This specification does not mandate a particular registered name
   lookup technology and therefore does not restrict the syntax of reg-
   name beyond what is necessary for interoperability.  Instead, it
   delegates the issue of registered name syntax conformance to the
   operating system of each application performing URI resolution, and
   that operating system decides what it will allow for the purpose of
   host identification.  A URI resolution implementation might use DNS,
   host tables, yellow pages, NetInfo, WINS, or any other system for
   lookup of registered names.  However, a globally scoped naming
   system, such as DNS fully qualified domain names, is necessary for
   URIs intended to have global scope.  URI producers should use names
   that conform to the DNS syntax, even when use of DNS is not
   immediately apparent, and should limit these names to no more than
   255 characters in length.

   The reg-name syntax allows percent-encoded octets in order to
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

   represent non-ASCII registered names in a uniform way that is
   independent of the underlying name resolution technology.  Non-ASCII
   characters must first be encoded according to UTF-8 [STD63], and then
   each octet of the corresponding UTF-8 sequence must be percent-
   encoded to be represented as URI characters.  URI producing
   applications must not use percent-encoding in host unless it is used
   to represent a UTF-8 character sequence.  When a non-ASCII registered
   name represents an internationalized domain name intended for
   resolution via the DNS, the name must be transformed to the IDNA
   encoding [RFC3490] prior to name lookup.  URI producers should
   provide these registered names in the IDNA encoding, rather than a
   percent-encoding, if they wish to maximize interoperability with
   legacy URI resolvers.
The following URL is a perfectly valid IDN-URL in percent-encoded octet form. We
should be able to convert this to UTF8 and then convert to punicode for DNS
lookup. With the patch in this bug we would make this an invalid URL which it
clearly is not. However even without the patch we choke on it because we do not
convert over UTF8 to punicode for DNS lookup. 
 http://www.%E3%81%BB%E3%82%93%E3%81%A8%E3%81%86%E3%81%AB%E3%81%AA%E3%81%8C%E3%81%84%E3%82%8F%E3%81%91%E3%81%AE%E3%82%8F%E3%81%8B%E3%82%89%E3%81%AA%E3%81%84%E3%81%A9%E3%82%81%E3%81%84%E3%82%93%E3%82%81%E3%81%84%E3%81%AE%E3%82%89%E3%81%B9%E3%82%8B%E3%81%BE%E3%81%A0%E3%81%AA%E3%81%8C%E3%81%8F%E3%81%97%E3%81%AA%E3%81%84%E3%81%A8%E3%81%9F%E3%82%8A%E3%81%AA%E3%81%84.w3.mag.keio.ac.jp/

This is the the url in punicode for which works fine:
 
http://www.xn--n8jaaaaai5bhf7as8fsfk3jnknefdde3fg11amb5gzdb4wi9bya3kc6lra.w3.mag.keio.ac.jp/

It seems to me we can use the patch as an intermittend solution (because we fail
anyway), the real solution would be to try to convert the hostname into UTF8 if
it contains at least one '%'. If that conversion fails, we throw an error. If it
succeeds, we convert to punicode for DNS lookup.
Comment on attachment 192981 [details] [diff] [review]
it seems to be a simple as this ...

I'm not a netlib peer, bumping to biesi.
Attachment #192981 - Flags: review?(benjamin) → review?(cbiesinger)
Comment on attachment 192981 [details] [diff] [review]
it seems to be a simple as this ...

Looks ok to me, but I'd really like darin to take a look at some point
(possibly post-landing) because of the IDN issue, which I don't fully
understand.
Attachment #192981 - Flags: superreview?(bzbarsky)
Attachment #192981 - Flags: superreview+
Attachment #192981 - Flags: review?(darin)
I see the patch as branch-1.8b4-only for a quick solution, that does not make
things worse than they already are (IDN-wise). For the trunk I'm trying to go
with the encoding check, but this may take some time (post darins vacation?)
because I don't know very much about IDN and charset conversion.
Attachment #192981 - Flags: review?(cbiesinger) → review+
Attachment #192981 - Flags: approval1.8b4?
Changing Hardware/OS to All to better describe the scope.
Status: NEW → ASSIGNED
OS: MacOS X → All
Hardware: Macintosh → All
Flags: blocking1.8b5+
Please land this on the trunk and we'll get it into the branch if all looks good.
The preliminary fix has been checked in on the trunk
The patch as checked into trunk breaks both SeaMonkey and Thunderbird - in both
of them the Local Folders directory no longer appears correctly and generates
lots of errors.

I backed this patch out again (locally) and everything was fine.

See bug 305868 and bug 305900 for reports of the regressions.
There are regressions, this has to be backed out. But that must mean that there
are urls which contain a '%' as hostname. 
backed out
Comment on attachment 192981 [details] [diff] [review]
it seems to be a simple as this ...

minusing for the branch. This caused some very unpleasant regressions
especially in mail.
Attachment #192981 - Flags: approval1.8b4? → approval1.8b4-
This broke mailbox urls looking like mailbox://nobody@Local%20Folders/Junk for
example.
(In reply to comment #18)
> There are regressions, this has to be backed out. But that must mean that there
> are urls which contain a '%' as hostname. 

We have plenty of URIs that contain host names with percents in them. The URIs
for accounts like Local Folders, RSS Feeds, etc. all get their spaces escaped
and look like:

i.e. mailbox://nobody@Local%20Folders/Templates
or
mailbox://nobody@RSS%20Feeds/Mozilla%20Feeds
Yep, I did not think about those. It would have been nice to catch bad hostnames
in the URLParser, but that seems impossible. Most likely we will have to push
this  check into the DNS resolver.
This is the new plan:

change AsyncResolve and Resolve in nsDNSService2.cpp just before the idn stuff
happens:

- unescape the hostname (new!)
- do the UTF8toACE conversion if necessary (old!)
- check for a list of invalid characters (blablist), if found return error
INVALID_HOSTNAME 


There may be better ways to do this. This might also not work with proxys, they
seem to bypass the resolver. Please try this patch on windows, please try with 
with proxys. Unescaping the host before checking with IsASCII also seems to do
almost the right thing with %-encoded IDN hostnames.
Who knows the proxy stuff?
This can be tested with the given IDN URL
Comment on attachment 193972 [details] [diff] [review]
this combined patch now even handles transparent proxys with remote DNS

requesting reviews, this time the local folders are there ;-)
Attachment #193972 - Flags: superreview?(bzbarsky)
Attachment #193972 - Flags: review?(cbiesinger)
Comment on attachment 193972 [details] [diff] [review]
this combined patch now even handles transparent proxys with remote DNS

>Index: netwerk/base/src/nsSocketTransport2.cpp
>+                const char *c, *begin = hostStart.get();
>+                for (c = begin; c != hostEnd.get(); ++c) {

Why do you need |begin| here?  Just start the loop with c = hostStart.get(). 
Same in the other places you have these loops.
Attachment #193972 - Flags: superreview?(bzbarsky) → superreview+
Attached patch Version (2.1) (obsolete) — Splinter Review
incorporated bzbarskys comment
Attachment #192981 - Attachment is obsolete: true
Attachment #193862 - Attachment is obsolete: true
Attachment #193972 - Attachment is obsolete: true
Attachment #193972 - Flags: review?(cbiesinger)
Attachment #194002 - Flags: superreview?(bzbarsky)
Attachment #194002 - Flags: review?(cbiesinger)
Attachment #194002 - Flags: superreview?(bzbarsky) → superreview+
Summary: Firefox should refuse to look up invalid hostnames containing "%" → Necko should refuse to look up invalid hostnames containing "%"
Comment on attachment 194002 [details] [diff] [review]
Version (2.1)

(could you also diff with -p?)

+		 NS_UnescapeURL(PromiseFlatCString(mHost).get(),
mHost.Length(),

No need for the PromiseFlatCString, mHost is already flat.

I don't like it that the invalid character list is copied into various files,
can we centralize it? maybe a function net_IsValidHostName(const_iterator
start, const_iterator end) in netCore.h or nsURLHelper.h? Could maybe use
net_FindCharInSet too.

+			 return NS_ERROR_UNKNOWN_HOST;

hrm, maybe another error code would be better. A new NS_ERROR_INVALID_HOST?

To avoid the duplication in nsDNSService2.cpp, maybe the code should move to
nsHostResolver::ResolveHost?

This doesn't address HTTP/HTTPS/FTP/Gopher proxies, is that a problem?
Attachment #194002 - Flags: review?(cbiesinger) → review-
(In reply to comment #32)
> (From update of attachment 194002 [details] [diff] [review] [edit])
> (could you also diff with -p?)
> 
> +		 NS_UnescapeURL(PromiseFlatCString(mHost).get(),
> mHost.Length(),
> 
> No need for the PromiseFlatCString, mHost is already flat.

okay, I will remove that
 
> I don't like it that the invalid character list is copied into various files,
> can we centralize it? maybe a function net_IsValidHostName(const_iterator
> start, const_iterator end) in netCore.h or nsURLHelper.h? Could maybe use
> net_FindCharInSet too.

yes, that might be better. I will try that.
 
> +			 return NS_ERROR_UNKNOWN_HOST;
> 
> hrm, maybe another error code would be better. A new NS_ERROR_INVALID_HOST?

I will try that, but it might not be so easy, because a certain reaction is
needed which seems to be limited to a given list, also localization for the new
message might be needed => to late for 1.8b4

> To avoid the duplication in nsDNSService2.cpp, maybe the code should move to
> nsHostResolver::ResolveHost?

The ACE conversion is integral part of the test which was there before, I don't
know if that can be moved, but I will take a look.
 
> This doesn't address HTTP/HTTPS/FTP/Gopher proxies, is that a problem?

It does not need to. Only those proxys are relevant which delegate the name
resolution to the server, all other cases are addressed by the code in
nsDNSService2.
>It does not need to. Only those proxys are relevant which delegate the name
>resolution to the server

the ones I mentioned do delegate name resolution to the server...
(In reply to comment #34)
> >It does not need to. Only those proxys are relevant which delegate the name
> >resolution to the server
> 
> the ones I mentioned do delegate name resolution to the server...

Where in the code does this show? The only place I could find was in
nsSocketTransport2.cpp with SOCKS5, everything else seems to be resolved locally.
  
Oops .. this patch still uses NS_ERROR_UNKNOWN_HOST, using a new error code
would be a huge undertaking, just look where NS_ERROR_UNKNOWN_HOST is used in
the code.
Attachment #194047 - Flags: review?(cbiesinger)
(In reply to comment #35)
> Where in the code does this show? The only place I could find was in
> nsSocketTransport2.cpp with SOCKS5, everything else seems to be resolved locally.

I think this is in nsSocketTransport2.cpp ResolveHost, which uses SocketHost()
for the hostname, which is the proxy host for the mentioned cases. note
nsSocketTransport::Init:

    if (proxyInfo) {
        mProxyPort = proxyInfo->Port();
        mProxyHost = proxyInfo->Host();
(In reply to comment #38)
> I think this is in nsSocketTransport2.cpp ResolveHost, which uses SocketHost()
> for the hostname, which is the proxy host for the mentioned cases. note
> nsSocketTransport::Init:
> 
>     if (proxyInfo) {
>         mProxyPort = proxyInfo->Port();
>         mProxyHost = proxyInfo->Host();

I see. SocketHost() is defined as

const nsCString &SocketHost() { return (!mProxyHost.IsEmpty() &&
!mProxyTransparent) ? mProxyHost : mHost; }

So we resolve the proxyhost if mProxyHost is not empty and the proxy is not
transparent, in all other cases we resolve the real host (unless the proxy
supports remoteDNS). The last case I've alredy got.

So if (!mProxyHost.IsEmpty() && !mProxyTransparent) we also need to check mHost.

Attached patch patch version 2.3 (obsolete) — Splinter Review
This version also finds the other cases where we use a proxy.
Attachment #194047 - Attachment is obsolete: true
Attachment #194053 - Flags: review?(cbiesinger)
Comment on attachment 194053 [details] [diff] [review]
patch version 2.3

hm, could you diff with both -p and -u? :)

I'm slightly concerned that this will make some valid, or at least used,
hostnames invalid... but maybe we can wait until we get bug reports.


nsSocketTransport2.cpp

+	  NS_UnescapeURL(mHost.get(), mHost.Length(),
+			 esc_AlwaysCopy, unescapedHost);

there's a version of NS_UnescapeURL that takes a string...


--- netwerk/base/src/nsStandardURL.cpp	27 Aug 2005 20:24:49 -0000
!     // this function returns PR_TRUE iff it writes something to |result|.
!     // this function returns PR_TRUE if it writes something to |result|.

why this change?



!     NS_UnescapeURL(PromiseFlatCString(host).get(), host.Length(),
!					esc_AlwaysCopy, unescapedHost);

that second line seems wrongly indented.


So with the standardurl change... is the unescaping in the DNS Service still
needed? I think one could make a case that callers should unescape if necessary
and that DNS service shouldn't have to. I guess that may also need simple uri
changes, then.

It looks like all your unescapeurl calls misindent the second line...


I don't understand the ACE problem with moving this to nsHostResolver... isn't
the host already in ACE form by the time the host resolver sees it?
(In reply to comment #41)
> (From update of attachment 194053 [details] [diff] [review] [edit])
> hm, could you diff with both -p and -u? :)

I will do next time ...
 
> I'm slightly concerned that this will make some valid, or at least used,
> hostnames invalid... but maybe we can wait until we get bug reports.
 
There is a slight risk there, but in pushing this check all the way to or just
before the resolver, I think it is very slim. I constructed the set of invalid
characters by using the reserved set of characters from rfc 3986 minus "[]:"
(for IP addresses) and adding "%\".  

We could do an even better check by looking for the correct syntax of the
hostname (IP4/IP6/Hostname containing '.', '-', digits and numbers) but that
would increase the risk.

> nsSocketTransport2.cpp
> 
> +	  NS_UnescapeURL(mHost.get(), mHost.Length(),
> +			 esc_AlwaysCopy, unescapedHost);
> 
> there's a version of NS_UnescapeURL that takes a string...

I don't want to use that version, because it uses nsUnescapeCount internaly and
I trust this relict from the NN4 codebase not one bit.
 
> --- netwerk/base/src/nsStandardURL.cpp	27 Aug 2005 20:24:49 -0000
> !     // this function returns PR_TRUE iff it writes something to |result|.
> !     // this function returns PR_TRUE if it writes something to |result|.
> 
> why this change?
 
It looked like a typo to me, I leave it in. 

> !     NS_UnescapeURL(PromiseFlatCString(host).get(), host.Length(),
> !					esc_AlwaysCopy, unescapedHost);
> 
> that second line seems wrongly indented.

I will fix that. 

> So with the standardurl change... is the unescaping in the DNS Service still
> needed? I think one could make a case that callers should unescape if necessary
> and that DNS service shouldn't have to. I guess that may also need simple uri
> changes, then.

Yes, that would be needed and possibly more. I think the resolver should do it
just in case, just to be sure. The unescaping in nsStandardURL is also needed,
because even if resolving correctly without the unescape, the request is wrong.
I stumbled about that when trying to figure out why the long escaped url from
comment #10 did not work. It does now.

> It looks like all your unescapeurl calls misindent the second line...
> 
> 
> I don't understand the ACE problem with moving this to nsHostResolver... isn't
> the host already in ACE form by the time the host resolver sees it?

Yes, I will look at it again. I wanted to move more than the check into
nsHostResolver, but that didn't work.
fixed identions and moved check into nsHostResolver
Attachment #194053 - Attachment is obsolete: true
Attachment #194053 - Flags: review?(cbiesinger)
[iff -> if]
>> why this change?
> It looked like a typo to me, I leave it in. 

nah, "iff" here means "if and only if"
oh, also: when speaking about NS_UnescapeURL, I meant this variant:
http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsEscape.h#174
174 inline const nsACString &
175 NS_UnescapeURL(const nsASingleFragmentCString &str, PRUint32 flags,
nsACString &result) {
176     const char *temp;
177     if (NS_UnescapeURL(str.BeginReading(temp), str.Length(), flags, result))
178         return result;
179     return str;
180 }
Comment on attachment 194065 [details] [diff] [review]
version 2.4 (diff -u and diff -p concatenated)

(this isn't a -u diff)
Attachment #194065 - Flags: review+
Attachment #194065 - Flags: superreview?(bzbarsky)
I remember why I didn't use nsRefPtr... because it didn't work:
nsObjectLoadingContent doesn't implement any AddRef/Release functions, and the
ones it inherits are ambiguous.
(In reply to comment #47)
> I remember why I didn't use nsRefPtr... because it didn't work:
> nsObjectLoadingContent doesn't implement any AddRef/Release functions, and the
> ones it inherits are ambiguous.

um, sorry, wrong bug.
Whiteboard: [sg:fix] → [sg:fix][needs SR bzbarsky]
Attachment #194065 - Flags: review?(darin)
> version 2.4 (diff -u and diff -p concatenated)

You want "diff -up8" I think...
Comment on attachment 194065 [details] [diff] [review]
version 2.4 (diff -u and diff -p concatenated)

>Index: netwerk/base/src/nsURLHelper.cpp
>+ net_IsValidHostName(const char *hostname, PRUint32 hostnameLen)
>+ {
>+     const char* end = hostname+hostnameLen; 

Spaces around the '+', please.

>Index: netwerk/base/src/nsStandardURL.cpp

>!     NS_UnescapeURL(PromiseFlatCString(host).get(), host.Length(),
>!                                       esc_AlwaysCopy, unescapedHost);

Fix the indent here, please....

I don't really know the ACE stuff enough to meaningfully review the rest of
this change, so I hope Darin will look at it before we ship it.  ;)
Attachment #194065 - Flags: superreview?(bzbarsky) → superreview+
Okay, I incorporated the latest review comments. Same procedure as last time. I
will checkin into the trunk. 
It's hard to know from reading the comments what the current patch actually does
- can someone summarise?

It may be relevant to point out that we already have a character blacklist for
domain names, which currently contains a couple of homographs of "/" and is
being expanded over in bug 301694.

Gerv
The latest patch here disallows:
/?#@!$&'()*+,;=%\

Hm... I think + should be removed from that list, possible & as well
This patch:

1. checks for a list of blacklisted ascii characters just before hostname
resolution. At that point the hostname is

- unescaped
- if UTF8 converted to punycode

So it contains only ascii characters at this point which makes the list rather
small.

2. checks for a list of blacklisted ascii characters when we use a proxy and do
no name resolution. Same conditions as with 1.

3. does an additional unescape before the ACE/UTF8 checks to catch hexencoded
UTF8 hostnames.
(In reply to comment #54)
> The latest patch here disallows:
> /?#@!$&'()*+,;=%\
> 
> Hm... I think + should be removed from that list, possible & as well

Why? Do you have a case where this characters are part of the hostname off an
url that is actually resolved?
Hold it a sec, chaps - we probably need to coordinate a little more closely
here. For a start, we need to check whether the IDN and non-IDN lists can be the
same check; secondly, Opera have some experience in this area that we should
leverage.

I have to go out now; I'll write more later.

Gerv
yes, I seem to recall a hostname that contained a +. unfortunately I forgot
which one that was...
I found this about hostnames:


[idn] hostname history hell

by "Eric A. Hall" <ehall@ehsco.com>


The earliest definition I know of for hostnames is rfc608 from 1974.

| in which <host-name> will be the official Host Name, a
| string obtained through negotiation between the Host and the
| NIC, governed by these constraints:
| 
|    up to 48 characters drawn from the alphabet (A-Z),
|    digits (0-9), and the minus sign (-) ... specifically,
|    no blank or space characters allowed;
| 
|    no distinction between upper and lower case letters;
| 
|    the first character is a letter;
| 
|    the last character is NOT a minus sign;
| 
|    no other restrictions on content or syntax.

Minimum 1 alpha character, maximum 48 alphanumeric chars, first char alpha
(ergo not hyphen), last char alphanumeric. The exclusion of beginning and
ending hyphens had to do with syntax of multi-part service-specific names
as is more clear later.

One side-note is that hyphen is one of the only punctuation characters
that is common between all implementations of the ECMA-6 charset.

Updated by rfc810

| 1. A "name" (Net, Host, Gateway, or Domain name) is a text string up
| to 24 characters drawn from the alphabet (A-Z), digits (0-9), and the
| minus sign (-) and period (.).  No blank or space characters are
| permitted as part of a name.  No distinction is made between upper
| and lower case.  The first character must be a letter.  The last
| character must not be a minus sign or period.  A host which serves as
| a GATEWAY should have "-GATEWAY" or "-GW" as part of its name.  A
| host which is a TIP or a TAC should have  "-TIP" or "-TAC" as part of
| its host name, if it is an ARPANET or DoD host.

Minimum 1 alpha character, maximum 24 alphanumeric chars (half the
original length from rfc608), first char alpha (ergo not hyphen), last
char alphanumeric. No explicit FQDN lengths defined.

Updated by rfc952

| 1. A "name" (Net, Host, Gateway, or Domain name) is a text string up
| to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus
| sign (-), and period (.).  Note that periods are only allowed when
| they serve to delimit components of "domain style names". (See
| RFC-921, "Domain Name System Implementation Schedule", for
| background).  No blank or space characters are permitted as part of a
| name. No distinction is made between upper and lower case.  The first
| character must be an alpha character.  The last character must not be
| a minus sign or period.  A host which serves as a GATEWAY should have
| "-GATEWAY" or "-GW" as part of its name.  Hosts which do not serve as
| Internet gateways should not use "-GATEWAY" and "-GW" as part of
| their names. A host which is a TAC should have "-TAC" as the last
| part of its host name, if it is a DoD host.  Single character names
| or nicknames are not allowed.

Minimum 2 alphanumeric characters (double the original minimum), maximum
24 alphanumeric chars, first char alpha (ergo not hyphen), last char
alphanumeric. No explicit FQDN lengths defined.

All of the above specifies host names as they appear in hosts databases.
These rules were often enforced by /etc/hosts implementations as well (eg,
some unix systems have maximum entry lengths of 24 chars in the local
/etc/hosts database).

rfc1035 expanded the hostname rules implicitly

| However, when assigning a domain name for an object, the prudent user
| will select a name which satisfies both the rules of the domain system
| and any existing rules for the object, whether these rules are published
| or implied by existing programs.

| The labels must follow the rules for ARPANET host names.  They must
| start with a letter, end with a letter or digit, and have as interior
| characters only letters, digits, and hyphen.  There are also some
| restrictions on the length.  Labels must be 63 characters or less.

Note that rfc1035 allows labels (and thus hostnames) to have minimum
length of 1 character, with a maximum of 63 characters for each label, and
with a fully-qualified domain name (and thus hostnames) to be 255
characters including the separators.

Minimum 2 alphanumeric character (inherited from rfc952), maximum 63
alphanumeric chars, first char alpha (ergo not hyphen), last char
alphanumeric. Maximum 255 for FQDN.

Some unification and clarification by rfc1123

| The syntax of a legal Internet host name was specified in RFC-952
| [DNS:4].  One aspect of host name syntax is hereby changed: the
| restriction on the first character is relaxed to allow either a
| letter or a digit.  Host software MUST support this more liberal
| syntax.

[first char alphanumeric, but not hyphen]

| Host software MUST handle host names of up to 63 characters and
| SHOULD handle host names of up to 255 characters.

[unified label length of 63 max, unified FQDN of 255, no unified minimum
length so minimum of 2 inherited from rfc952 prevails]

| Whenever a user inputs the identity of an Internet host, it SHOULD
| be possible to enter either (1) a host domain name or (2) an IP
| address in dotted-decimal ("#.#.#.#") form.  The host SHOULD check
| the string syntactically for a dotted-decimal number before
| looking it up in the Domain Name System.
| 
| DISCUSSION:
|      This last requirement is not intended to specify the complete
|      syntactic form for entering a dotted-decimal host number;
|      that is considered to be a user-interface issue.  For
|      example, a dotted-decimal number must be enclosed within
|      "[ ]" brackets for SMTP mail (see Section 5.2.17).  This
|      notation could be made universal within a host system,
|      simplifying the syntactic checking for a dotted-decimal
|      number.
|      
|      If a dotted-decimal number can be entered without such
|      identifying delimiters, then a full syntactic check must be
|      made, because a segment of a host domain name is now allowed
|      to begin with a digit and could legally be entirely numeric
|      (see Section 6.1.2.4).  However, a valid host name can never
|      have the dotted-decimal form #.#.#.#, since at least the
|      highest-level component label will be alphabetic.

Note that the last sentence says that the trailing label must contain
alpha chars, probably under the assumption that TLDs will be used. The
rule is not explicity stated but it is implied strongly.

Minimum 2 alphanumeric characters, maximum 63 alphanumeric chars, first
char alphanumeric, last char alphanumeric. Maximum 255 for FQDN. FQDN must
contain at least one alpha.

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/





no '+' allowed by any RFC on the topic and I have not seen it anywhere. '_'
seems to be used sometimes in violation of the standards, I left it off the list. 
doesn't look like this is going to make beta1. We'll re-evaluate the risk in the
beta2 cycle.
Flags: blocking1.8b4rc+
Attachment #192981 - Flags: review?(darin)
Whiteboard: [sg:fix][needs SR bzbarsky] → [sg:fix][needs review darin]
For the branch I'd be happy with simply rejecting hostnames that contain '%'
characters.  I'm leary about unescaping the %-encoded hostnames.  Firefox 1.0
does not unescape '%' characters, and we've barely received any complaints about
that.

In other words, why don't we just take the first patch for the 1.8 branch?

Also, I checked that the squid proxy server does not accept escaped characters
that form a valid IDN (even when all of the escaped characters are simple ASCII
alpha characters).  So, it seems to me that unescaping %-escaped IDN can wait.
Darin, the first patch completly breaks thunderbird and seamonkey-mail because
of the internaly used mailbox-urls that contain '%'. You tried the same thing in
bug 110938 some time ago.

We can loose the change in nsStandardURL.cpp for the branch, those %-encoded
IDN-URLs are rare, but they are mentioned in the RFCs, they seem to exist. sqiud
seems to be broken the same way mozilla currently is.

And then there are the objections of gerv, because there is an effort to create
a unicode based black-/whitelist which is expanded in bug 301694. I don't think
that effort is the right choice for the problems described in this bug, but
maybe there is synergie possible.
I guess we could also loose the unescapes before every check with
IsValidHostName for the branch if they scare you. 
Ah right... the mailbox URLs :-(  How about instead of rejecting the URL at the
parser level, we reject it as the DNS level?  We could add code to
nsDNSService2.cpp (or nsHostResolver.cpp) that scans the hostname and rejects
invalid characters.
That is exactly what this patch does :-)
This patch for the 1.8 branch has no unescapes and the list of invalid
characters is smaller.
Opera 7.5 uses the following (courtesy of Yngve Pettersen):

"The list used in 7.5x is:
   0021-0023;
   0025-002C;
   002F; Forward slash
   003B-0040;
   005C;
   005E;
   007B-007D;
   007F;

"%" and "/" are the most important of these, since they can impact both  visual
and machine interpretation of a URI, especially if the servername  is converted,
and then later parsed again.

The others were added because they are not (or should not be) valid in a  DNS
name. (Hmmm, on second thought, not sure U+002B "+" should be on the  list)"

Gerv
Those are more or less the characters I used, Opera is even more restrictive. +
should not be in a hostname, excluding it is correct, but biesi remembers having
seen at least one host with a +, but does not remember where and when. 

gerv: Where is the check from bug 301694 located, where does it happen?
 
Andreas: as I said in that bug, read the patch attached to bug 283016, which
shows exactly what code there is and where :-)

As for "+", I don't know if Yngve took it out of the list for Opera 8 or not. I
can email him and ask if you like. But I figure if we ban it, people will stop
using it pretty quickly.

Gerv
(In reply to comment #69)
> Andreas: as I said in that bug, read the patch attached to bug 283016, which
> shows exactly what code there is and where :-)

Ooops, I missed that being at work and only looking at the one bug number I
remembered ... 
 
> As for "+", I don't know if Yngve took it out of the list for Opera 8 or not. I
> can email him and ask if you like. But I figure if we ban it, people will stop
> using it pretty quickly.

I think we should block it.

This version has an extended list of invalid characters in sync with Opera and
its own error code NS_ERROR_INVALID_HOST with error texts.
Attachment #194572 - Flags: superreview?(darin)
Attachment #194572 - Flags: review?(cbiesinger)
Whiteboard: [sg:fix][needs review darin] → [sg:fix][needs review+SR darin, review cbiesinger]
Comment on attachment 194572 [details] [diff] [review]
This is the deluxe version of the patch

mozilla/browser/locales/en-US/chrome/overrides/netError.dtd
+<p>The provided hostname is not in a recognized format. If you clicked on a
link most likely someone tries to trick you, otherwise check the location bar
for mistakes and try again.</p>


that should probably be wrapped a bit.

looks good otherwise, r=biesi
Attachment #194572 - Flags: review?(cbiesinger) → review+
Whiteboard: [sg:fix][needs review+SR darin, review cbiesinger] → [sg:fix][needs review+SR darin]
Do I need an additional review for the changes in
mozilla/browser/locales/en-US/chrome/overrides? 

From Ben Goodger for example?
(In reply to comment #73)
> Do I need an additional review for the changes in
> mozilla/browser/locales/en-US/chrome/overrides?

No, per http://www.mozilla.org/projects/firefox/review.html, point #1.
Attachment #195305 - Flags: superreview?(darin)
Attachment #195305 - Flags: review?(cbiesinger)
Comment on attachment 195305 [details] [diff] [review]
supplementing the last patch this version of net_IsValidHostName includes all controlcharacters < \x20

hrm, ok... but this is pretty unreadable...
Attachment #195305 - Flags: review?(cbiesinger) → review+
(In reply to comment #76)
> (From update of attachment 195305 [details] [diff] [review] [edit])
> hrm, ok... but this is pretty unreadable...

That's why I put the comment line with the readable characters above the real
string with the unreadable stuff. Do you have a better way to have all these
control characters in a string and have it still readable? This way it is at
least consistent.
I have no good suggestion, no. maybe darin does :)
It has been a chore trying to follow the comments here, but I wanted to mention
that RFC3490 (http://www.ietf.org/rfc/rfc3490.txt), specifically sections "4.
Conversion operations" through "4.2 ToUnicode" (pages 9-11), should help with
this bug, especially for those who want to brush up on the subject.
Ok, I think time is up for me on this one. Even if a superreview arrives now,
checking something into the tree by myself so close to my vacation seems unwise.
If you want something of this to happen for beta2, somebody else has to do it. 

What happend so far: 

- the first take (catching % in the URL-Parser) was a bad idea because of the
mailbox urls.

- the second take was to catch % and other bad characters in the hostname
shortly before DNS resolution. This approach went through several iterations, in
the process I found a whole class of hostnames (%-encoded UTF8 hostnames) we
currently are unable to handle (but are mentioned in RFC 3986 (3.2.2)).

The patches on this bug do the following:

1. check for a list of blacklisted ascii characters just before hostname
resolution. At that point the hostname is

- unescaped
- if UTF8 converted to punycode

So it contains only ascii characters at this point which makes the list to check
against rather small.

2. check for a list of blacklisted ascii characters when we use a proxy and do
no name resolution. Same conditions as with 1.

3. do an additional unescape in nsStandardURL before the ACE/UTF8 checks to
catch %-encoded UTF8 hostnames.


There are now three different valid patches for this bug:

1) reviewed/superreviewed version for the trunk. This version does what is
listed above, but throws an NS_ERROR_UNKNOWN_HOST which is actually only an
approximation for the real error. Because of this I don't want to have it on the
trunk.

https://bugzilla.mozilla.org/attachment.cgi?id=194146


2) conserative branch version. This version does no unescapes and has a reduced
list of invalid characters. Also uses NS_ERROR_UNKNOWN_HOST. It will not be able
to handle %-encoded UTF8 hostnames.

https://bugzilla.mozilla.org/attachment.cgi?id=194475


3) latest version for the trunk. This version has an extended list of invalid
characters equal to Opera and its own error code NS_ERROR_INVALID_HOST with
error text hinting on phishing. Also it does the unescapes to handle %-encoded
UTF8 hostnames.

https://bugzilla.mozilla.org/attachment.cgi?id=194572


There also is one more patch,

https://bugzilla.mozilla.org/attachment.cgi?id=195305

which adds control-characters to the list of invalid characters for hostnames
(essential when we unescape hostname). This patch can be used as replacement in
any of the avove three patches for the changes to nsURLHelper.cpp.


I would like to have patch 3 (pending superreview, augmented by patch 4) on the
trunk and if it behaves have patch 1 (augmented by patch 4) on the branch for
(1.8b5).
Attachment #194572 - Attachment is obsolete: true
Attachment #195305 - Attachment is obsolete: true
Attachment #194065 - Attachment is obsolete: true
Attachment #194146 - Attachment is obsolete: true
changing owner
Assignee: andreas.otte → darin
Status: ASSIGNED → NEW
Attachment #195305 - Flags: superreview?(darin)
Attachment #194572 - Flags: superreview?(darin)
Attachment #194065 - Flags: review?(darin)
Andreas:  Thanks for the patches.  I'm sorry I wasn't able to review them before
your vacation.  I'll work on getting the minimal patch into the tree for Firefox
1.5.
Severity: normal → major
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.8beta5
I'm curious why version 2 treats '/' as an invalid character?  Since version 2
does not introduce unescaping, it would seem that there is no way for a '/' to
end up in a hostname extracted from an URL.  Is the check for '/' merely an
extra level of precaution in case parsing logic upstream ever changes?
This is Andreas' minimal patch for the 1.8 branch.  I applied my review nits to
the patch, and so I'd just like one more set of eyes on this patch before I
check it in.  Biesi: would you please do the honors?
Attachment #197088 - Flags: superreview+
Attachment #197088 - Flags: review?(cbiesinger)
Attachment #197088 - Flags: review?(cbiesinger) → review+
Attachment #197088 - Flags: approval1.8b5?
fixed-on-trunk

Let's move the enhancements that we want on the trunk into another bug.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Blocks: 309671
Blocks: 309672
See bug 309671 and bug 309672 for followup work.
Trying to verify this with the % laden URL in comment #10. I get a Problem
loading page "Server not found" message page. Is that what is supposed to happen
or should  I have been taken to the same page as the second URL in comment #10?
(In reply to comment #89)
> Trying to verify this with the % laden URL in comment #10. I get a Problem
> loading page "Server not found" message page. Is that what is supposed to happen
> or should  I have been taken to the same page as the second URL in comment #10?

Tracy, what you are observing is correct and expected behavior.  For Firefox
1.5, we are not going to be unescaping % characters found in the hostname part
of the URL.  (See bug 309671.)
Status: RESOLVED → VERIFIED
Attachment #197088 - Flags: approval1.8b5? → approval1.8b5+
fixed1.8
Keywords: fixed1.8
Blocks: 309128
Bug 340276 makes it sound like this might have regressed...
Blocks: 325274
Keywords: csec-spoof, sec-low
Whiteboard: [sg:fix][needs review+SR darin] → [needs review+SR darin]
Whiteboard: [needs review+SR darin]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: