Closed
Bug 304904
Opened 18 years ago
Closed 18 years ago
Necko should refuse to look up invalid hostnames containing "%"
Categories
(Core :: Networking, defect)
Core
Networking
Tracking
()
VERIFIED
FIXED
mozilla1.8beta5
People
(Reporter: jruderman, Assigned: darin.moz)
References
Details
(Keywords: csectype-spoof, fixed1.8, sec-low)
Attachments
(4 files, 11 obsolete files)
6.70 KB,
patch
|
Details | Diff | Splinter Review | |
30.80 KB,
patch
|
Details | Diff | Splinter Review | |
15.21 KB,
patch
|
Details | Diff | Splinter Review | |
9.29 KB,
patch
|
Biesinger
:
review+
darin.moz
:
superreview+
asa
:
approval1.8b5+
|
Details | Diff | Splinter Review |
Firefox should refuse to look up hostnames containing "%", so that it can be consistently secure against spoofing across platforms. See bug 246804 comment 14. This can be accomplished with either a whitelist or a blacklist: - Blacklist "%" and "\". - Whitelist the legal characters and "_". (Maybe enforce syntax too, but that's bug 140379.)
Reporter | ||
Comment 1•18 years ago
|
||
This is necessary to prevent spoofing, especially when Firefox is used with mail clients that don't have a fix for bug 304905.
Reporter | ||
Updated•18 years ago
|
Whiteboard: [sg:fix]
Reporter | ||
Updated•18 years ago
|
Flags: blocking1.8b4?
Comment 2•18 years ago
|
||
This could be done in nsAuthURLParser::ParseUserInfo in mozilla/netwerk/base/src/nsURLParsers.cpp
Comment 3•18 years ago
|
||
Andreas, do you think you can help us with a patch on this? We'd like to do this but Darin's out on vacation.
Flags: blocking1.8b4? → blocking1.8b4+
Comment 4•18 years ago
|
||
This may take a few hours, I first have to get a current tree and build it ...
Comment 5•18 years ago
|
||
Awesome! Thanks, Andreas, for looking into this.
Assignee: darin → andreas.otte
Comment 6•18 years ago
|
||
Comment 7•18 years ago
|
||
This needs testing against a lot of urls, especially IDN-urls
Comment 8•18 years ago
|
||
So far I found no problem ... URLs with % in the hostname however get a nice little alert box saying "The URL is not valid an can not be loaded."
Updated•18 years ago
|
Attachment #192981 -
Flags: review?(bzbarsky)
Updated•18 years ago
|
Attachment #192981 -
Flags: superreview?(bzbarsky)
Attachment #192981 -
Flags: review?(bzbarsky)
Attachment #192981 -
Flags: review?(benjamin)
Comment 9•18 years ago
|
||
However what is done with the patch violates RFC 3986 (3.2.2): This specification does not mandate a particular registered name lookup technology and therefore does not restrict the syntax of reg- name beyond what is necessary for interoperability. Instead, it delegates the issue of registered name syntax conformance to the operating system of each application performing URI resolution, and that operating system decides what it will allow for the purpose of host identification. A URI resolution implementation might use DNS, host tables, yellow pages, NetInfo, WINS, or any other system for lookup of registered names. However, a globally scoped naming system, such as DNS fully qualified domain names, is necessary for URIs intended to have global scope. URI producers should use names that conform to the DNS syntax, even when use of DNS is not immediately apparent, and should limit these names to no more than 255 characters in length. The reg-name syntax allows percent-encoded octets in order to ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ represent non-ASCII registered names in a uniform way that is independent of the underlying name resolution technology. Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent- encoded to be represented as URI characters. URI producing applications must not use percent-encoding in host unless it is used to represent a UTF-8 character sequence. When a non-ASCII registered name represents an internationalized domain name intended for resolution via the DNS, the name must be transformed to the IDNA encoding [RFC3490] prior to name lookup. URI producers should provide these registered names in the IDNA encoding, rather than a percent-encoding, if they wish to maximize interoperability with legacy URI resolvers.
Comment 10•18 years ago
|
||
The following URL is a perfectly valid IDN-URL in percent-encoded octet form. We should be able to convert this to UTF8 and then convert to punicode for DNS lookup. With the patch in this bug we would make this an invalid URL which it clearly is not. However even without the patch we choke on it because we do not convert over UTF8 to punicode for DNS lookup. http://www.%E3%81%BB%E3%82%93%E3%81%A8%E3%81%86%E3%81%AB%E3%81%AA%E3%81%8C%E3%81%84%E3%82%8F%E3%81%91%E3%81%AE%E3%82%8F%E3%81%8B%E3%82%89%E3%81%AA%E3%81%84%E3%81%A9%E3%82%81%E3%81%84%E3%82%93%E3%82%81%E3%81%84%E3%81%AE%E3%82%89%E3%81%B9%E3%82%8B%E3%81%BE%E3%81%A0%E3%81%AA%E3%81%8C%E3%81%8F%E3%81%97%E3%81%AA%E3%81%84%E3%81%A8%E3%81%9F%E3%82%8A%E3%81%AA%E3%81%84.w3.mag.keio.ac.jp/ This is the the url in punicode for which works fine: http://www.xn--n8jaaaaai5bhf7as8fsfk3jnknefdde3fg11amb5gzdb4wi9bya3kc6lra.w3.mag.keio.ac.jp/ It seems to me we can use the patch as an intermittend solution (because we fail anyway), the real solution would be to try to convert the hostname into UTF8 if it contains at least one '%'. If that conversion fails, we throw an error. If it succeeds, we convert to punicode for DNS lookup.
Comment 11•18 years ago
|
||
Comment on attachment 192981 [details] [diff] [review] it seems to be a simple as this ... I'm not a netlib peer, bumping to biesi.
Attachment #192981 -
Flags: review?(benjamin) → review?(cbiesinger)
![]() |
||
Comment 12•18 years ago
|
||
Comment on attachment 192981 [details] [diff] [review] it seems to be a simple as this ... Looks ok to me, but I'd really like darin to take a look at some point (possibly post-landing) because of the IDN issue, which I don't fully understand.
Attachment #192981 -
Flags: superreview?(bzbarsky)
Attachment #192981 -
Flags: superreview+
Attachment #192981 -
Flags: review?(darin)
Comment 13•18 years ago
|
||
I see the patch as branch-1.8b4-only for a quick solution, that does not make things worse than they already are (IDN-wise). For the trunk I'm trying to go with the encoding check, but this may take some time (post darins vacation?) because I don't know very much about IDN and charset conversion.
Updated•18 years ago
|
Attachment #192981 -
Flags: review?(cbiesinger) → review+
Updated•18 years ago
|
Attachment #192981 -
Flags: approval1.8b4?
Comment 14•18 years ago
|
||
Changing Hardware/OS to All to better describe the scope.
Status: NEW → ASSIGNED
OS: MacOS X → All
Hardware: Macintosh → All
Updated•18 years ago
|
Flags: blocking1.8b5+
Comment 15•18 years ago
|
||
Please land this on the trunk and we'll get it into the branch if all looks good.
Comment 16•18 years ago
|
||
The preliminary fix has been checked in on the trunk
Comment 17•18 years ago
|
||
The patch as checked into trunk breaks both SeaMonkey and Thunderbird - in both of them the Local Folders directory no longer appears correctly and generates lots of errors. I backed this patch out again (locally) and everything was fine. See bug 305868 and bug 305900 for reports of the regressions.
Comment 18•18 years ago
|
||
There are regressions, this has to be backed out. But that must mean that there are urls which contain a '%' as hostname.
Comment 19•18 years ago
|
||
backed out
Comment 20•18 years ago
|
||
Comment on attachment 192981 [details] [diff] [review] it seems to be a simple as this ... minusing for the branch. This caused some very unpleasant regressions especially in mail.
Attachment #192981 -
Flags: approval1.8b4? → approval1.8b4-
Comment 21•18 years ago
|
||
This broke mailbox urls looking like mailbox://nobody@Local%20Folders/Junk for example.
Comment 22•18 years ago
|
||
(In reply to comment #18) > There are regressions, this has to be backed out. But that must mean that there > are urls which contain a '%' as hostname. We have plenty of URIs that contain host names with percents in them. The URIs for accounts like Local Folders, RSS Feeds, etc. all get their spaces escaped and look like: i.e. mailbox://nobody@Local%20Folders/Templates or mailbox://nobody@RSS%20Feeds/Mozilla%20Feeds
Comment 23•18 years ago
|
||
Yep, I did not think about those. It would have been nice to catch bad hostnames in the URLParser, but that seems impossible. Most likely we will have to push this check into the DNS resolver.
Comment 24•18 years ago
|
||
This is the new plan: change AsyncResolve and Resolve in nsDNSService2.cpp just before the idn stuff happens: - unescape the hostname (new!) - do the UTF8toACE conversion if necessary (old!) - check for a list of invalid characters (blablist), if found return error INVALID_HOSTNAME
Comment 25•18 years ago
|
||
There may be better ways to do this. This might also not work with proxys, they seem to bypass the resolver. Please try this patch on windows, please try with with proxys. Unescaping the host before checking with IsASCII also seems to do almost the right thing with %-encoded IDN hostnames.
Comment 26•18 years ago
|
||
Who knows the proxy stuff?
Comment 27•18 years ago
|
||
This can be tested with the given IDN URL
Comment 28•18 years ago
|
||
Comment 29•18 years ago
|
||
Comment on attachment 193972 [details] [diff] [review] this combined patch now even handles transparent proxys with remote DNS requesting reviews, this time the local folders are there ;-)
Attachment #193972 -
Flags: superreview?(bzbarsky)
Attachment #193972 -
Flags: review?(cbiesinger)
![]() |
||
Comment 30•18 years ago
|
||
Comment on attachment 193972 [details] [diff] [review] this combined patch now even handles transparent proxys with remote DNS >Index: netwerk/base/src/nsSocketTransport2.cpp >+ const char *c, *begin = hostStart.get(); >+ for (c = begin; c != hostEnd.get(); ++c) { Why do you need |begin| here? Just start the loop with c = hostStart.get(). Same in the other places you have these loops.
Attachment #193972 -
Flags: superreview?(bzbarsky) → superreview+
Comment 31•18 years ago
|
||
incorporated bzbarskys comment
Attachment #192981 -
Attachment is obsolete: true
Attachment #193862 -
Attachment is obsolete: true
Attachment #193972 -
Attachment is obsolete: true
Updated•18 years ago
|
Attachment #193972 -
Flags: review?(cbiesinger)
Updated•18 years ago
|
Attachment #194002 -
Flags: superreview?(bzbarsky)
Attachment #194002 -
Flags: review?(cbiesinger)
![]() |
||
Updated•18 years ago
|
Attachment #194002 -
Flags: superreview?(bzbarsky) → superreview+
Updated•18 years ago
|
Summary: Firefox should refuse to look up invalid hostnames containing "%" → Necko should refuse to look up invalid hostnames containing "%"
Comment 32•18 years ago
|
||
Comment on attachment 194002 [details] [diff] [review] Version (2.1) (could you also diff with -p?) + NS_UnescapeURL(PromiseFlatCString(mHost).get(), mHost.Length(), No need for the PromiseFlatCString, mHost is already flat. I don't like it that the invalid character list is copied into various files, can we centralize it? maybe a function net_IsValidHostName(const_iterator start, const_iterator end) in netCore.h or nsURLHelper.h? Could maybe use net_FindCharInSet too. + return NS_ERROR_UNKNOWN_HOST; hrm, maybe another error code would be better. A new NS_ERROR_INVALID_HOST? To avoid the duplication in nsDNSService2.cpp, maybe the code should move to nsHostResolver::ResolveHost? This doesn't address HTTP/HTTPS/FTP/Gopher proxies, is that a problem?
Attachment #194002 -
Flags: review?(cbiesinger) → review-
Comment 33•18 years ago
|
||
(In reply to comment #32) > (From update of attachment 194002 [details] [diff] [review] [edit]) > (could you also diff with -p?) > > + NS_UnescapeURL(PromiseFlatCString(mHost).get(), > mHost.Length(), > > No need for the PromiseFlatCString, mHost is already flat. okay, I will remove that > I don't like it that the invalid character list is copied into various files, > can we centralize it? maybe a function net_IsValidHostName(const_iterator > start, const_iterator end) in netCore.h or nsURLHelper.h? Could maybe use > net_FindCharInSet too. yes, that might be better. I will try that. > + return NS_ERROR_UNKNOWN_HOST; > > hrm, maybe another error code would be better. A new NS_ERROR_INVALID_HOST? I will try that, but it might not be so easy, because a certain reaction is needed which seems to be limited to a given list, also localization for the new message might be needed => to late for 1.8b4 > To avoid the duplication in nsDNSService2.cpp, maybe the code should move to > nsHostResolver::ResolveHost? The ACE conversion is integral part of the test which was there before, I don't know if that can be moved, but I will take a look. > This doesn't address HTTP/HTTPS/FTP/Gopher proxies, is that a problem? It does not need to. Only those proxys are relevant which delegate the name resolution to the server, all other cases are addressed by the code in nsDNSService2.
Comment 34•18 years ago
|
||
>It does not need to. Only those proxys are relevant which delegate the name
>resolution to the server
the ones I mentioned do delegate name resolution to the server...
Comment 35•18 years ago
|
||
(In reply to comment #34) > >It does not need to. Only those proxys are relevant which delegate the name > >resolution to the server > > the ones I mentioned do delegate name resolution to the server... Where in the code does this show? The only place I could find was in nsSocketTransport2.cpp with SOCKS5, everything else seems to be resolved locally.
Comment 36•18 years ago
|
||
This
Attachment #193954 -
Attachment is obsolete: true
Attachment #194002 -
Attachment is obsolete: true
Comment 37•18 years ago
|
||
Oops .. this patch still uses NS_ERROR_UNKNOWN_HOST, using a new error code would be a huge undertaking, just look where NS_ERROR_UNKNOWN_HOST is used in the code.
Updated•18 years ago
|
Attachment #194047 -
Flags: review?(cbiesinger)
Comment 38•18 years ago
|
||
(In reply to comment #35) > Where in the code does this show? The only place I could find was in > nsSocketTransport2.cpp with SOCKS5, everything else seems to be resolved locally. I think this is in nsSocketTransport2.cpp ResolveHost, which uses SocketHost() for the hostname, which is the proxy host for the mentioned cases. note nsSocketTransport::Init: if (proxyInfo) { mProxyPort = proxyInfo->Port(); mProxyHost = proxyInfo->Host();
Comment 39•18 years ago
|
||
(In reply to comment #38) > I think this is in nsSocketTransport2.cpp ResolveHost, which uses SocketHost() > for the hostname, which is the proxy host for the mentioned cases. note > nsSocketTransport::Init: > > if (proxyInfo) { > mProxyPort = proxyInfo->Port(); > mProxyHost = proxyInfo->Host(); I see. SocketHost() is defined as const nsCString &SocketHost() { return (!mProxyHost.IsEmpty() && !mProxyTransparent) ? mProxyHost : mHost; } So we resolve the proxyhost if mProxyHost is not empty and the proxy is not transparent, in all other cases we resolve the real host (unless the proxy supports remoteDNS). The last case I've alredy got. So if (!mProxyHost.IsEmpty() && !mProxyTransparent) we also need to check mHost.
Comment 40•18 years ago
|
||
This version also finds the other cases where we use a proxy.
Attachment #194047 -
Attachment is obsolete: true
Updated•18 years ago
|
Attachment #194053 -
Flags: review?(cbiesinger)
Updated•18 years ago
|
Attachment #194047 -
Flags: review?(cbiesinger)
Comment 41•18 years ago
|
||
Comment on attachment 194053 [details] [diff] [review] patch version 2.3 hm, could you diff with both -p and -u? :) I'm slightly concerned that this will make some valid, or at least used, hostnames invalid... but maybe we can wait until we get bug reports. nsSocketTransport2.cpp + NS_UnescapeURL(mHost.get(), mHost.Length(), + esc_AlwaysCopy, unescapedHost); there's a version of NS_UnescapeURL that takes a string... --- netwerk/base/src/nsStandardURL.cpp 27 Aug 2005 20:24:49 -0000 ! // this function returns PR_TRUE iff it writes something to |result|. ! // this function returns PR_TRUE if it writes something to |result|. why this change? ! NS_UnescapeURL(PromiseFlatCString(host).get(), host.Length(), ! esc_AlwaysCopy, unescapedHost); that second line seems wrongly indented. So with the standardurl change... is the unescaping in the DNS Service still needed? I think one could make a case that callers should unescape if necessary and that DNS service shouldn't have to. I guess that may also need simple uri changes, then. It looks like all your unescapeurl calls misindent the second line... I don't understand the ACE problem with moving this to nsHostResolver... isn't the host already in ACE form by the time the host resolver sees it?
Comment 42•18 years ago
|
||
(In reply to comment #41) > (From update of attachment 194053 [details] [diff] [review] [edit]) > hm, could you diff with both -p and -u? :) I will do next time ... > I'm slightly concerned that this will make some valid, or at least used, > hostnames invalid... but maybe we can wait until we get bug reports. There is a slight risk there, but in pushing this check all the way to or just before the resolver, I think it is very slim. I constructed the set of invalid characters by using the reserved set of characters from rfc 3986 minus "[]:" (for IP addresses) and adding "%\". We could do an even better check by looking for the correct syntax of the hostname (IP4/IP6/Hostname containing '.', '-', digits and numbers) but that would increase the risk. > nsSocketTransport2.cpp > > + NS_UnescapeURL(mHost.get(), mHost.Length(), > + esc_AlwaysCopy, unescapedHost); > > there's a version of NS_UnescapeURL that takes a string... I don't want to use that version, because it uses nsUnescapeCount internaly and I trust this relict from the NN4 codebase not one bit. > --- netwerk/base/src/nsStandardURL.cpp 27 Aug 2005 20:24:49 -0000 > ! // this function returns PR_TRUE iff it writes something to |result|. > ! // this function returns PR_TRUE if it writes something to |result|. > > why this change? It looked like a typo to me, I leave it in. > ! NS_UnescapeURL(PromiseFlatCString(host).get(), host.Length(), > ! esc_AlwaysCopy, unescapedHost); > > that second line seems wrongly indented. I will fix that. > So with the standardurl change... is the unescaping in the DNS Service still > needed? I think one could make a case that callers should unescape if necessary > and that DNS service shouldn't have to. I guess that may also need simple uri > changes, then. Yes, that would be needed and possibly more. I think the resolver should do it just in case, just to be sure. The unescaping in nsStandardURL is also needed, because even if resolving correctly without the unescape, the request is wrong. I stumbled about that when trying to figure out why the long escaped url from comment #10 did not work. It does now. > It looks like all your unescapeurl calls misindent the second line... > > > I don't understand the ACE problem with moving this to nsHostResolver... isn't > the host already in ACE form by the time the host resolver sees it? Yes, I will look at it again. I wanted to move more than the check into nsHostResolver, but that didn't work.
Comment 43•18 years ago
|
||
fixed identions and moved check into nsHostResolver
Attachment #194053 -
Attachment is obsolete: true
Updated•18 years ago
|
Attachment #194053 -
Flags: review?(cbiesinger)
Comment 44•18 years ago
|
||
[iff -> if]
>> why this change?
> It looked like a typo to me, I leave it in.
nah, "iff" here means "if and only if"
Comment 45•18 years ago
|
||
oh, also: when speaking about NS_UnescapeURL, I meant this variant: http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsEscape.h#174 174 inline const nsACString & 175 NS_UnescapeURL(const nsASingleFragmentCString &str, PRUint32 flags, nsACString &result) { 176 const char *temp; 177 if (NS_UnescapeURL(str.BeginReading(temp), str.Length(), flags, result)) 178 return result; 179 return str; 180 }
Comment 46•18 years ago
|
||
Comment on attachment 194065 [details] [diff] [review] version 2.4 (diff -u and diff -p concatenated) (this isn't a -u diff)
Attachment #194065 -
Flags: review+
Updated•18 years ago
|
Attachment #194065 -
Flags: superreview?(bzbarsky)
Comment 47•18 years ago
|
||
I remember why I didn't use nsRefPtr... because it didn't work: nsObjectLoadingContent doesn't implement any AddRef/Release functions, and the ones it inherits are ambiguous.
Comment 48•18 years ago
|
||
(In reply to comment #47) > I remember why I didn't use nsRefPtr... because it didn't work: > nsObjectLoadingContent doesn't implement any AddRef/Release functions, and the > ones it inherits are ambiguous. um, sorry, wrong bug.
Updated•18 years ago
|
Whiteboard: [sg:fix] → [sg:fix][needs SR bzbarsky]
Updated•18 years ago
|
Attachment #194065 -
Flags: review?(darin)
![]() |
||
Comment 49•18 years ago
|
||
> version 2.4 (diff -u and diff -p concatenated)
You want "diff -up8" I think...
![]() |
||
Comment 50•18 years ago
|
||
Comment on attachment 194065 [details] [diff] [review] version 2.4 (diff -u and diff -p concatenated) >Index: netwerk/base/src/nsURLHelper.cpp >+ net_IsValidHostName(const char *hostname, PRUint32 hostnameLen) >+ { >+ const char* end = hostname+hostnameLen; Spaces around the '+', please. >Index: netwerk/base/src/nsStandardURL.cpp >! NS_UnescapeURL(PromiseFlatCString(host).get(), host.Length(), >! esc_AlwaysCopy, unescapedHost); Fix the indent here, please.... I don't really know the ACE stuff enough to meaningfully review the rest of this change, so I hope Darin will look at it before we ship it. ;)
Attachment #194065 -
Flags: superreview?(bzbarsky) → superreview+
Comment 51•18 years ago
|
||
Okay, I incorporated the latest review comments. Same procedure as last time. I will checkin into the trunk.
Comment 52•18 years ago
|
||
Comment 53•18 years ago
|
||
It's hard to know from reading the comments what the current patch actually does - can someone summarise? It may be relevant to point out that we already have a character blacklist for domain names, which currently contains a couple of homographs of "/" and is being expanded over in bug 301694. Gerv
Comment 54•18 years ago
|
||
The latest patch here disallows: /?#@!$&'()*+,;=%\ Hm... I think + should be removed from that list, possible & as well
Comment 55•18 years ago
|
||
This patch: 1. checks for a list of blacklisted ascii characters just before hostname resolution. At that point the hostname is - unescaped - if UTF8 converted to punycode So it contains only ascii characters at this point which makes the list rather small. 2. checks for a list of blacklisted ascii characters when we use a proxy and do no name resolution. Same conditions as with 1. 3. does an additional unescape before the ACE/UTF8 checks to catch hexencoded UTF8 hostnames.
Comment 56•18 years ago
|
||
(In reply to comment #54) > The latest patch here disallows: > /?#@!$&'()*+,;=%\ > > Hm... I think + should be removed from that list, possible & as well Why? Do you have a case where this characters are part of the hostname off an url that is actually resolved?
Comment 57•18 years ago
|
||
Hold it a sec, chaps - we probably need to coordinate a little more closely here. For a start, we need to check whether the IDN and non-IDN lists can be the same check; secondly, Opera have some experience in this area that we should leverage. I have to go out now; I'll write more later. Gerv
Comment 58•18 years ago
|
||
yes, I seem to recall a hostname that contained a +. unfortunately I forgot which one that was...
Comment 59•18 years ago
|
||
I found this about hostnames: [idn] hostname history hell by "Eric A. Hall" <ehall@ehsco.com> The earliest definition I know of for hostnames is rfc608 from 1974. | in which <host-name> will be the official Host Name, a | string obtained through negotiation between the Host and the | NIC, governed by these constraints: | | up to 48 characters drawn from the alphabet (A-Z), | digits (0-9), and the minus sign (-) ... specifically, | no blank or space characters allowed; | | no distinction between upper and lower case letters; | | the first character is a letter; | | the last character is NOT a minus sign; | | no other restrictions on content or syntax. Minimum 1 alpha character, maximum 48 alphanumeric chars, first char alpha (ergo not hyphen), last char alphanumeric. The exclusion of beginning and ending hyphens had to do with syntax of multi-part service-specific names as is more clear later. One side-note is that hyphen is one of the only punctuation characters that is common between all implementations of the ECMA-6 charset. Updated by rfc810 | 1. A "name" (Net, Host, Gateway, or Domain name) is a text string up | to 24 characters drawn from the alphabet (A-Z), digits (0-9), and the | minus sign (-) and period (.). No blank or space characters are | permitted as part of a name. No distinction is made between upper | and lower case. The first character must be a letter. The last | character must not be a minus sign or period. A host which serves as | a GATEWAY should have "-GATEWAY" or "-GW" as part of its name. A | host which is a TIP or a TAC should have "-TIP" or "-TAC" as part of | its host name, if it is an ARPANET or DoD host. Minimum 1 alpha character, maximum 24 alphanumeric chars (half the original length from rfc608), first char alpha (ergo not hyphen), last char alphanumeric. No explicit FQDN lengths defined. Updated by rfc952 | 1. A "name" (Net, Host, Gateway, or Domain name) is a text string up | to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus | sign (-), and period (.). Note that periods are only allowed when | they serve to delimit components of "domain style names". (See | RFC-921, "Domain Name System Implementation Schedule", for | background). No blank or space characters are permitted as part of a | name. No distinction is made between upper and lower case. The first | character must be an alpha character. The last character must not be | a minus sign or period. A host which serves as a GATEWAY should have | "-GATEWAY" or "-GW" as part of its name. Hosts which do not serve as | Internet gateways should not use "-GATEWAY" and "-GW" as part of | their names. A host which is a TAC should have "-TAC" as the last | part of its host name, if it is a DoD host. Single character names | or nicknames are not allowed. Minimum 2 alphanumeric characters (double the original minimum), maximum 24 alphanumeric chars, first char alpha (ergo not hyphen), last char alphanumeric. No explicit FQDN lengths defined. All of the above specifies host names as they appear in hosts databases. These rules were often enforced by /etc/hosts implementations as well (eg, some unix systems have maximum entry lengths of 24 chars in the local /etc/hosts database). rfc1035 expanded the hostname rules implicitly | However, when assigning a domain name for an object, the prudent user | will select a name which satisfies both the rules of the domain system | and any existing rules for the object, whether these rules are published | or implied by existing programs. | The labels must follow the rules for ARPANET host names. They must | start with a letter, end with a letter or digit, and have as interior | characters only letters, digits, and hyphen. There are also some | restrictions on the length. Labels must be 63 characters or less. Note that rfc1035 allows labels (and thus hostnames) to have minimum length of 1 character, with a maximum of 63 characters for each label, and with a fully-qualified domain name (and thus hostnames) to be 255 characters including the separators. Minimum 2 alphanumeric character (inherited from rfc952), maximum 63 alphanumeric chars, first char alpha (ergo not hyphen), last char alphanumeric. Maximum 255 for FQDN. Some unification and clarification by rfc1123 | The syntax of a legal Internet host name was specified in RFC-952 | [DNS:4]. One aspect of host name syntax is hereby changed: the | restriction on the first character is relaxed to allow either a | letter or a digit. Host software MUST support this more liberal | syntax. [first char alphanumeric, but not hyphen] | Host software MUST handle host names of up to 63 characters and | SHOULD handle host names of up to 255 characters. [unified label length of 63 max, unified FQDN of 255, no unified minimum length so minimum of 2 inherited from rfc952 prevails] | Whenever a user inputs the identity of an Internet host, it SHOULD | be possible to enter either (1) a host domain name or (2) an IP | address in dotted-decimal ("#.#.#.#") form. The host SHOULD check | the string syntactically for a dotted-decimal number before | looking it up in the Domain Name System. | | DISCUSSION: | This last requirement is not intended to specify the complete | syntactic form for entering a dotted-decimal host number; | that is considered to be a user-interface issue. For | example, a dotted-decimal number must be enclosed within | "[ ]" brackets for SMTP mail (see Section 5.2.17). This | notation could be made universal within a host system, | simplifying the syntactic checking for a dotted-decimal | number. | | If a dotted-decimal number can be entered without such | identifying delimiters, then a full syntactic check must be | made, because a segment of a host domain name is now allowed | to begin with a digit and could legally be entirely numeric | (see Section 6.1.2.4). However, a valid host name can never | have the dotted-decimal form #.#.#.#, since at least the | highest-level component label will be alphabetic. Note that the last sentence says that the trailing label must contain alpha chars, probably under the assumption that TLDs will be used. The rule is not explicity stated but it is implied strongly. Minimum 2 alphanumeric characters, maximum 63 alphanumeric chars, first char alphanumeric, last char alphanumeric. Maximum 255 for FQDN. FQDN must contain at least one alpha. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/ no '+' allowed by any RFC on the topic and I have not seen it anywhere. '_' seems to be used sometimes in violation of the standards, I left it off the list.
Comment 60•18 years ago
|
||
doesn't look like this is going to make beta1. We'll re-evaluate the risk in the beta2 cycle.
Flags: blocking1.8b4rc+
Updated•18 years ago
|
Attachment #192981 -
Flags: review?(darin)
Updated•18 years ago
|
Whiteboard: [sg:fix][needs SR bzbarsky] → [sg:fix][needs review darin]
Assignee | ||
Comment 61•18 years ago
|
||
For the branch I'd be happy with simply rejecting hostnames that contain '%' characters. I'm leary about unescaping the %-encoded hostnames. Firefox 1.0 does not unescape '%' characters, and we've barely received any complaints about that. In other words, why don't we just take the first patch for the 1.8 branch? Also, I checked that the squid proxy server does not accept escaped characters that form a valid IDN (even when all of the escaped characters are simple ASCII alpha characters). So, it seems to me that unescaping %-escaped IDN can wait.
Comment 62•18 years ago
|
||
Darin, the first patch completly breaks thunderbird and seamonkey-mail because of the internaly used mailbox-urls that contain '%'. You tried the same thing in bug 110938 some time ago. We can loose the change in nsStandardURL.cpp for the branch, those %-encoded IDN-URLs are rare, but they are mentioned in the RFCs, they seem to exist. sqiud seems to be broken the same way mozilla currently is. And then there are the objections of gerv, because there is an effort to create a unicode based black-/whitelist which is expanded in bug 301694. I don't think that effort is the right choice for the problems described in this bug, but maybe there is synergie possible.
Comment 63•18 years ago
|
||
I guess we could also loose the unescapes before every check with IsValidHostName for the branch if they scare you.
Assignee | ||
Comment 64•18 years ago
|
||
Ah right... the mailbox URLs :-( How about instead of rejecting the URL at the parser level, we reject it as the DNS level? We could add code to nsDNSService2.cpp (or nsHostResolver.cpp) that scans the hostname and rejects invalid characters.
Comment 65•18 years ago
|
||
That is exactly what this patch does :-)
Comment 66•18 years ago
|
||
This patch for the 1.8 branch has no unescapes and the list of invalid characters is smaller.
Comment 67•18 years ago
|
||
Opera 7.5 uses the following (courtesy of Yngve Pettersen): "The list used in 7.5x is: 0021-0023; 0025-002C; 002F; Forward slash 003B-0040; 005C; 005E; 007B-007D; 007F; "%" and "/" are the most important of these, since they can impact both visual and machine interpretation of a URI, especially if the servername is converted, and then later parsed again. The others were added because they are not (or should not be) valid in a DNS name. (Hmmm, on second thought, not sure U+002B "+" should be on the list)" Gerv
Comment 68•18 years ago
|
||
Those are more or less the characters I used, Opera is even more restrictive. + should not be in a hostname, excluding it is correct, but biesi remembers having seen at least one host with a +, but does not remember where and when. gerv: Where is the check from bug 301694 located, where does it happen?
Comment 69•18 years ago
|
||
Andreas: as I said in that bug, read the patch attached to bug 283016, which shows exactly what code there is and where :-) As for "+", I don't know if Yngve took it out of the list for Opera 8 or not. I can email him and ask if you like. But I figure if we ban it, people will stop using it pretty quickly. Gerv
Comment 70•18 years ago
|
||
(In reply to comment #69) > Andreas: as I said in that bug, read the patch attached to bug 283016, which > shows exactly what code there is and where :-) Ooops, I missed that being at work and only looking at the one bug number I remembered ... > As for "+", I don't know if Yngve took it out of the list for Opera 8 or not. I > can email him and ask if you like. But I figure if we ban it, people will stop > using it pretty quickly. I think we should block it.
Comment 71•18 years ago
|
||
This version has an extended list of invalid characters in sync with Opera and its own error code NS_ERROR_INVALID_HOST with error texts.
Updated•18 years ago
|
Attachment #194572 -
Flags: superreview?(darin)
Attachment #194572 -
Flags: review?(cbiesinger)
Updated•18 years ago
|
Whiteboard: [sg:fix][needs review darin] → [sg:fix][needs review+SR darin, review cbiesinger]
Comment 72•18 years ago
|
||
Comment on attachment 194572 [details] [diff] [review] This is the deluxe version of the patch mozilla/browser/locales/en-US/chrome/overrides/netError.dtd +<p>The provided hostname is not in a recognized format. If you clicked on a link most likely someone tries to trick you, otherwise check the location bar for mistakes and try again.</p> that should probably be wrapped a bit. looks good otherwise, r=biesi
Attachment #194572 -
Flags: review?(cbiesinger) → review+
Updated•18 years ago
|
Whiteboard: [sg:fix][needs review+SR darin, review cbiesinger] → [sg:fix][needs review+SR darin]
Comment 73•18 years ago
|
||
Do I need an additional review for the changes in mozilla/browser/locales/en-US/chrome/overrides? From Ben Goodger for example?
Comment 74•18 years ago
|
||
(In reply to comment #73) > Do I need an additional review for the changes in > mozilla/browser/locales/en-US/chrome/overrides? No, per http://www.mozilla.org/projects/firefox/review.html, point #1.
Comment 75•18 years ago
|
||
Updated•18 years ago
|
Attachment #195305 -
Flags: superreview?(darin)
Attachment #195305 -
Flags: review?(cbiesinger)
Comment 76•18 years ago
|
||
Comment on attachment 195305 [details] [diff] [review] supplementing the last patch this version of net_IsValidHostName includes all controlcharacters < \x20 hrm, ok... but this is pretty unreadable...
Attachment #195305 -
Flags: review?(cbiesinger) → review+
Comment 77•18 years ago
|
||
(In reply to comment #76) > (From update of attachment 195305 [details] [diff] [review] [edit]) > hrm, ok... but this is pretty unreadable... That's why I put the comment line with the readable characters above the real string with the unreadable stuff. Do you have a better way to have all these control characters in a string and have it still readable? This way it is at least consistent.
Comment 78•18 years ago
|
||
I have no good suggestion, no. maybe darin does :)
Comment 79•18 years ago
|
||
It has been a chore trying to follow the comments here, but I wanted to mention that RFC3490 (http://www.ietf.org/rfc/rfc3490.txt), specifically sections "4. Conversion operations" through "4.2 ToUnicode" (pages 9-11), should help with this bug, especially for those who want to brush up on the subject.
Comment 80•18 years ago
|
||
Ok, I think time is up for me on this one. Even if a superreview arrives now, checking something into the tree by myself so close to my vacation seems unwise. If you want something of this to happen for beta2, somebody else has to do it. What happend so far: - the first take (catching % in the URL-Parser) was a bad idea because of the mailbox urls. - the second take was to catch % and other bad characters in the hostname shortly before DNS resolution. This approach went through several iterations, in the process I found a whole class of hostnames (%-encoded UTF8 hostnames) we currently are unable to handle (but are mentioned in RFC 3986 (3.2.2)). The patches on this bug do the following: 1. check for a list of blacklisted ascii characters just before hostname resolution. At that point the hostname is - unescaped - if UTF8 converted to punycode So it contains only ascii characters at this point which makes the list to check against rather small. 2. check for a list of blacklisted ascii characters when we use a proxy and do no name resolution. Same conditions as with 1. 3. do an additional unescape in nsStandardURL before the ACE/UTF8 checks to catch %-encoded UTF8 hostnames. There are now three different valid patches for this bug: 1) reviewed/superreviewed version for the trunk. This version does what is listed above, but throws an NS_ERROR_UNKNOWN_HOST which is actually only an approximation for the real error. Because of this I don't want to have it on the trunk. https://bugzilla.mozilla.org/attachment.cgi?id=194146 2) conserative branch version. This version does no unescapes and has a reduced list of invalid characters. Also uses NS_ERROR_UNKNOWN_HOST. It will not be able to handle %-encoded UTF8 hostnames. https://bugzilla.mozilla.org/attachment.cgi?id=194475 3) latest version for the trunk. This version has an extended list of invalid characters equal to Opera and its own error code NS_ERROR_INVALID_HOST with error text hinting on phishing. Also it does the unescapes to handle %-encoded UTF8 hostnames. https://bugzilla.mozilla.org/attachment.cgi?id=194572 There also is one more patch, https://bugzilla.mozilla.org/attachment.cgi?id=195305 which adds control-characters to the list of invalid characters for hostnames (essential when we unescape hostname). This patch can be used as replacement in any of the avove three patches for the changes to nsURLHelper.cpp. I would like to have patch 3 (pending superreview, augmented by patch 4) on the trunk and if it behaves have patch 1 (augmented by patch 4) on the branch for (1.8b5).
Comment 81•18 years ago
|
||
Attachment #194572 -
Attachment is obsolete: true
Attachment #195305 -
Attachment is obsolete: true
Comment 82•18 years ago
|
||
Attachment #194065 -
Attachment is obsolete: true
Attachment #194146 -
Attachment is obsolete: true
Assignee | ||
Updated•18 years ago
|
Attachment #195305 -
Flags: superreview?(darin)
Assignee | ||
Updated•18 years ago
|
Attachment #194572 -
Flags: superreview?(darin)
Assignee | ||
Updated•18 years ago
|
Attachment #194065 -
Flags: review?(darin)
Assignee | ||
Comment 84•18 years ago
|
||
Andreas: Thanks for the patches. I'm sorry I wasn't able to review them before your vacation. I'll work on getting the minimal patch into the tree for Firefox 1.5.
Severity: normal → major
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.8beta5
Assignee | ||
Comment 85•18 years ago
|
||
I'm curious why version 2 treats '/' as an invalid character? Since version 2 does not introduce unescaping, it would seem that there is no way for a '/' to end up in a hostname extracted from an URL. Is the check for '/' merely an extra level of precaution in case parsing logic upstream ever changes?
Assignee | ||
Comment 86•18 years ago
|
||
This is Andreas' minimal patch for the 1.8 branch. I applied my review nits to the patch, and so I'd just like one more set of eyes on this patch before I check it in. Biesi: would you please do the honors?
Assignee | ||
Updated•18 years ago
|
Attachment #197088 -
Flags: superreview+
Attachment #197088 -
Flags: review?(cbiesinger)
Updated•18 years ago
|
Attachment #197088 -
Flags: review?(cbiesinger) → review+
Assignee | ||
Updated•18 years ago
|
Attachment #197088 -
Flags: approval1.8b5?
Assignee | ||
Comment 87•18 years ago
|
||
fixed-on-trunk Let's move the enhancements that we want on the trunk into another bug.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 88•18 years ago
|
||
See bug 309671 and bug 309672 for followup work.
Comment 89•18 years ago
|
||
Trying to verify this with the % laden URL in comment #10. I get a Problem loading page "Server not found" message page. Is that what is supposed to happen or should I have been taken to the same page as the second URL in comment #10?
Assignee | ||
Comment 90•18 years ago
|
||
(In reply to comment #89) > Trying to verify this with the % laden URL in comment #10. I get a Problem > loading page "Server not found" message page. Is that what is supposed to happen > or should I have been taken to the same page as the second URL in comment #10? Tracy, what you are observing is correct and expected behavior. For Firefox 1.5, we are not going to be unescaping % characters found in the hostname part of the URL. (See bug 309671.)
Updated•18 years ago
|
Status: RESOLVED → VERIFIED
Updated•18 years ago
|
Attachment #197088 -
Flags: approval1.8b5? → approval1.8b5+
Reporter | ||
Comment 92•18 years ago
|
||
Bug 340276 makes it sound like this might have regressed...
Reporter | ||
Updated•10 years ago
|
Keywords: csec-spoof,
sec-low
Whiteboard: [sg:fix][needs review+SR darin] → [needs review+SR darin]
Updated•10 years ago
|
Whiteboard: [needs review+SR darin]
You need to log in
before you can comment on or make changes to this bug.
Description
•