Last Comment Bug 304904 - Necko should refuse to look up invalid hostnames containing "%"
: Necko should refuse to look up invalid hostnames containing "%"
Status: VERIFIED FIXED
: csectype-spoof, fixed1.8, sec-low
Product: Core
Classification: Components
Component: Networking (show other bugs)
: Trunk
: All All
: -- major with 1 vote (vote)
: mozilla1.8beta5
Assigned To: Darin Fisher
: benc
Mentors:
: 205456 (view as bug list)
Depends on:
Blocks: 325274 140379 205456 246804 309128 309671 309672
  Show dependency treegraph
 
Reported: 2005-08-16 18:35 PDT by Jesse Ruderman
Modified: 2013-06-10 10:45 PDT (History)
16 users (show)
asa: blocking1.8b5+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
it seems to be a simple as this ... (989 bytes, patch)
2005-08-17 12:56 PDT, Andreas Otte
cbiesinger: review+
bzbarsky: superreview+
mscott: approval1.8b4-
Details | Diff | Review
another attempt, this time in the resolver (3.27 KB, patch)
2005-08-25 16:22 PDT, Andreas Otte
no flags Details | Diff | Review
adding to the second attempt this patch fixes our handling of %-encoded UTF8 hostnames (I think) (2.12 KB, patch)
2005-08-26 12:25 PDT, Andreas Otte
no flags Details | Diff | Review
this combined patch now even handles transparent proxys with remote DNS (5.21 KB, patch)
2005-08-26 14:25 PDT, Andreas Otte
bzbarsky: superreview+
Details | Diff | Review
Version (2.1) (5.16 KB, patch)
2005-08-26 22:26 PDT, Andreas Otte
cbiesinger: review-
bzbarsky: superreview+
Details | Diff | Review
patch updated with cbiesingers comments, also includes the unescaping of the hostname in nsStandardURL.cpp (10.96 KB, patch)
2005-08-27 11:43 PDT, Andreas Otte
no flags Details | Diff | Review
patch version 2.3 (11.25 KB, patch)
2005-08-27 13:28 PDT, Andreas Otte
no flags Details | Diff | Review
version 2.4 (diff -u and diff -p concatenated) (19.38 KB, patch)
2005-08-27 16:22 PDT, Andreas Otte
cbiesinger: review+
bzbarsky: superreview+
Details | Diff | Review
for reverence only, current patch after review comments (13.78 KB, patch)
2005-08-28 22:59 PDT, Andreas Otte
no flags Details | Diff | Review
toned down version of the patch for the 1.8 branch (6.70 KB, patch)
2005-08-31 14:50 PDT, Andreas Otte
no flags Details | Diff | Review
This is the deluxe version of the patch (30.26 KB, patch)
2005-09-01 13:17 PDT, Andreas Otte
cbiesinger: review+
Details | Diff | Review
supplementing the last patch this version of net_IsValidHostName includes all controlcharacters < \x20 (1.67 KB, patch)
2005-09-08 12:12 PDT, Andreas Otte
cbiesinger: review+
Details | Diff | Review
updated deluxe patch to prevent bitrot, also contains the supplement patch (30.80 KB, patch)
2005-09-13 14:58 PDT, Andreas Otte
no flags Details | Diff | Review
updated patch version for the 1.8 branch to prevent bitrot (15.21 KB, patch)
2005-09-13 15:34 PDT, Andreas Otte
no flags Details | Diff | Review
Revised final patch for the 1.8 branch (9.29 KB, patch)
2005-09-22 14:03 PDT, Darin Fisher
cbiesinger: review+
darin.moz: superreview+
asa: approval1.8b5+
Details | Diff | Review

Description Jesse Ruderman 2005-08-16 18:35:28 PDT
Firefox should refuse to look up hostnames containing "%", so that it can be
consistently secure against spoofing across platforms. See bug 246804 comment 14.

This can be accomplished with either a whitelist or a blacklist:
- Blacklist "%" and "\".
- Whitelist the legal characters and "_". (Maybe enforce syntax too, but that's
bug 140379.)
Comment 1 Jesse Ruderman 2005-08-16 18:45:24 PDT
This is necessary to prevent spoofing, especially when Firefox is used with mail
clients that don't have a fix for bug 304905.
Comment 2 Andreas Otte 2005-08-16 22:27:37 PDT
This could be done in nsAuthURLParser::ParseUserInfo in
mozilla/netwerk/base/src/nsURLParsers.cpp
Comment 3 Asa Dotzler [:asa] 2005-08-17 11:38:28 PDT
Andreas, do you think you can help us with a patch on this? We'd like to do this
but Darin's out on vacation. 
Comment 4 Andreas Otte 2005-08-17 12:10:27 PDT
This may take a few hours, I first have to get a current tree and build it ...
Comment 5 Asa Dotzler [:asa] 2005-08-17 12:16:49 PDT
Awesome! Thanks, Andreas, for looking into this.
Comment 6 Andreas Otte 2005-08-17 12:56:14 PDT
Created attachment 192981 [details] [diff] [review]
it seems to be a simple as this ...
Comment 7 Andreas Otte 2005-08-17 12:57:51 PDT
This needs testing against a lot of urls, especially IDN-urls
Comment 8 Andreas Otte 2005-08-17 13:27:38 PDT
So far I found no problem ... URLs with % in the hostname however get a nice
little alert box saying "The URL is not valid an can not be loaded."

Comment 9 Andreas Otte 2005-08-17 14:52:31 PDT
However what is done with the patch violates RFC 3986 (3.2.2):

   This specification does not mandate a particular registered name
   lookup technology and therefore does not restrict the syntax of reg-
   name beyond what is necessary for interoperability.  Instead, it
   delegates the issue of registered name syntax conformance to the
   operating system of each application performing URI resolution, and
   that operating system decides what it will allow for the purpose of
   host identification.  A URI resolution implementation might use DNS,
   host tables, yellow pages, NetInfo, WINS, or any other system for
   lookup of registered names.  However, a globally scoped naming
   system, such as DNS fully qualified domain names, is necessary for
   URIs intended to have global scope.  URI producers should use names
   that conform to the DNS syntax, even when use of DNS is not
   immediately apparent, and should limit these names to no more than
   255 characters in length.

   The reg-name syntax allows percent-encoded octets in order to
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

   represent non-ASCII registered names in a uniform way that is
   independent of the underlying name resolution technology.  Non-ASCII
   characters must first be encoded according to UTF-8 [STD63], and then
   each octet of the corresponding UTF-8 sequence must be percent-
   encoded to be represented as URI characters.  URI producing
   applications must not use percent-encoding in host unless it is used
   to represent a UTF-8 character sequence.  When a non-ASCII registered
   name represents an internationalized domain name intended for
   resolution via the DNS, the name must be transformed to the IDNA
   encoding [RFC3490] prior to name lookup.  URI producers should
   provide these registered names in the IDNA encoding, rather than a
   percent-encoding, if they wish to maximize interoperability with
   legacy URI resolvers.
Comment 10 Andreas Otte 2005-08-18 10:26:38 PDT
The following URL is a perfectly valid IDN-URL in percent-encoded octet form. We
should be able to convert this to UTF8 and then convert to punicode for DNS
lookup. With the patch in this bug we would make this an invalid URL which it
clearly is not. However even without the patch we choke on it because we do not
convert over UTF8 to punicode for DNS lookup. 
 http://www.%E3%81%BB%E3%82%93%E3%81%A8%E3%81%86%E3%81%AB%E3%81%AA%E3%81%8C%E3%81%84%E3%82%8F%E3%81%91%E3%81%AE%E3%82%8F%E3%81%8B%E3%82%89%E3%81%AA%E3%81%84%E3%81%A9%E3%82%81%E3%81%84%E3%82%93%E3%82%81%E3%81%84%E3%81%AE%E3%82%89%E3%81%B9%E3%82%8B%E3%81%BE%E3%81%A0%E3%81%AA%E3%81%8C%E3%81%8F%E3%81%97%E3%81%AA%E3%81%84%E3%81%A8%E3%81%9F%E3%82%8A%E3%81%AA%E3%81%84.w3.mag.keio.ac.jp/

This is the the url in punicode for which works fine:
 
http://www.xn--n8jaaaaai5bhf7as8fsfk3jnknefdde3fg11amb5gzdb4wi9bya3kc6lra.w3.mag.keio.ac.jp/

It seems to me we can use the patch as an intermittend solution (because we fail
anyway), the real solution would be to try to convert the hostname into UTF8 if
it contains at least one '%'. If that conversion fails, we throw an error. If it
succeeds, we convert to punicode for DNS lookup.
Comment 11 Benjamin Smedberg [:bsmedberg] 2005-08-18 13:05:38 PDT
Comment on attachment 192981 [details] [diff] [review]
it seems to be a simple as this ...

I'm not a netlib peer, bumping to biesi.
Comment 12 Boris Zbarsky [:bz] 2005-08-18 17:35:17 PDT
Comment on attachment 192981 [details] [diff] [review]
it seems to be a simple as this ...

Looks ok to me, but I'd really like darin to take a look at some point
(possibly post-landing) because of the IDN issue, which I don't fully
understand.
Comment 13 Andreas Otte 2005-08-18 23:23:46 PDT
I see the patch as branch-1.8b4-only for a quick solution, that does not make
things worse than they already are (IDN-wise). For the trunk I'm trying to go
with the encoding check, but this may take some time (post darins vacation?)
because I don't know very much about IDN and charset conversion.
Comment 14 Andreas Otte 2005-08-22 13:40:56 PDT
Changing Hardware/OS to All to better describe the scope.
Comment 15 Asa Dotzler [:asa] 2005-08-24 11:56:50 PDT
Please land this on the trunk and we'll get it into the branch if all looks good.
Comment 16 Andreas Otte 2005-08-24 12:52:33 PDT
The preliminary fix has been checked in on the trunk
Comment 17 Mark Banner (:standard8) 2005-08-25 12:10:55 PDT
The patch as checked into trunk breaks both SeaMonkey and Thunderbird - in both
of them the Local Folders directory no longer appears correctly and generates
lots of errors.

I backed this patch out again (locally) and everything was fine.

See bug 305868 and bug 305900 for reports of the regressions.
Comment 18 Andreas Otte 2005-08-25 12:22:41 PDT
There are regressions, this has to be backed out. But that must mean that there
are urls which contain a '%' as hostname. 
Comment 19 Andreas Otte 2005-08-25 12:26:45 PDT
backed out
Comment 20 Scott MacGregor 2005-08-25 12:36:22 PDT
Comment on attachment 192981 [details] [diff] [review]
it seems to be a simple as this ...

minusing for the branch. This caused some very unpleasant regressions
especially in mail.
Comment 21 Andreas Otte 2005-08-25 12:40:34 PDT
This broke mailbox urls looking like mailbox://nobody@Local%20Folders/Junk for
example.
Comment 22 Scott MacGregor 2005-08-25 12:42:19 PDT
(In reply to comment #18)
> There are regressions, this has to be backed out. But that must mean that there
> are urls which contain a '%' as hostname. 

We have plenty of URIs that contain host names with percents in them. The URIs
for accounts like Local Folders, RSS Feeds, etc. all get their spaces escaped
and look like:

i.e. mailbox://nobody@Local%20Folders/Templates
or
mailbox://nobody@RSS%20Feeds/Mozilla%20Feeds
Comment 23 Andreas Otte 2005-08-25 12:53:59 PDT
Yep, I did not think about those. It would have been nice to catch bad hostnames
in the URLParser, but that seems impossible. Most likely we will have to push
this  check into the DNS resolver.
Comment 24 Andreas Otte 2005-08-25 13:22:04 PDT
This is the new plan:

change AsyncResolve and Resolve in nsDNSService2.cpp just before the idn stuff
happens:

- unescape the hostname (new!)
- do the UTF8toACE conversion if necessary (old!)
- check for a list of invalid characters (blablist), if found return error
INVALID_HOSTNAME 


Comment 25 Andreas Otte 2005-08-25 16:22:19 PDT
Created attachment 193862 [details] [diff] [review]
another attempt, this time in the resolver

There may be better ways to do this. This might also not work with proxys, they
seem to bypass the resolver. Please try this patch on windows, please try with 
with proxys. Unescaping the host before checking with IsASCII also seems to do
almost the right thing with %-encoded IDN hostnames.
Comment 26 Andreas Otte 2005-08-25 16:43:51 PDT
Who knows the proxy stuff?
Comment 27 Andreas Otte 2005-08-26 12:25:46 PDT
Created attachment 193954 [details] [diff] [review]
adding to the second attempt this patch fixes our handling of %-encoded UTF8 hostnames (I think)

This can be tested with the given IDN URL
Comment 28 Andreas Otte 2005-08-26 14:25:13 PDT
Created attachment 193972 [details] [diff] [review]
this combined patch now even handles transparent proxys with remote DNS
Comment 29 Andreas Otte 2005-08-26 15:39:46 PDT
Comment on attachment 193972 [details] [diff] [review]
this combined patch now even handles transparent proxys with remote DNS

requesting reviews, this time the local folders are there ;-)
Comment 30 Boris Zbarsky [:bz] 2005-08-26 20:04:56 PDT
Comment on attachment 193972 [details] [diff] [review]
this combined patch now even handles transparent proxys with remote DNS

>Index: netwerk/base/src/nsSocketTransport2.cpp
>+                const char *c, *begin = hostStart.get();
>+                for (c = begin; c != hostEnd.get(); ++c) {

Why do you need |begin| here?  Just start the loop with c = hostStart.get(). 
Same in the other places you have these loops.
Comment 31 Andreas Otte 2005-08-26 22:26:58 PDT
Created attachment 194002 [details] [diff] [review]
Version (2.1)

incorporated bzbarskys comment
Comment 32 Christian :Biesinger (don't email me, ping me on IRC) 2005-08-27 07:48:13 PDT
Comment on attachment 194002 [details] [diff] [review]
Version (2.1)

(could you also diff with -p?)

+		 NS_UnescapeURL(PromiseFlatCString(mHost).get(),
mHost.Length(),

No need for the PromiseFlatCString, mHost is already flat.

I don't like it that the invalid character list is copied into various files,
can we centralize it? maybe a function net_IsValidHostName(const_iterator
start, const_iterator end) in netCore.h or nsURLHelper.h? Could maybe use
net_FindCharInSet too.

+			 return NS_ERROR_UNKNOWN_HOST;

hrm, maybe another error code would be better. A new NS_ERROR_INVALID_HOST?

To avoid the duplication in nsDNSService2.cpp, maybe the code should move to
nsHostResolver::ResolveHost?

This doesn't address HTTP/HTTPS/FTP/Gopher proxies, is that a problem?
Comment 33 Andreas Otte 2005-08-27 09:27:00 PDT
(In reply to comment #32)
> (From update of attachment 194002 [details] [diff] [review] [edit])
> (could you also diff with -p?)
> 
> +		 NS_UnescapeURL(PromiseFlatCString(mHost).get(),
> mHost.Length(),
> 
> No need for the PromiseFlatCString, mHost is already flat.

okay, I will remove that
 
> I don't like it that the invalid character list is copied into various files,
> can we centralize it? maybe a function net_IsValidHostName(const_iterator
> start, const_iterator end) in netCore.h or nsURLHelper.h? Could maybe use
> net_FindCharInSet too.

yes, that might be better. I will try that.
 
> +			 return NS_ERROR_UNKNOWN_HOST;
> 
> hrm, maybe another error code would be better. A new NS_ERROR_INVALID_HOST?

I will try that, but it might not be so easy, because a certain reaction is
needed which seems to be limited to a given list, also localization for the new
message might be needed => to late for 1.8b4

> To avoid the duplication in nsDNSService2.cpp, maybe the code should move to
> nsHostResolver::ResolveHost?

The ACE conversion is integral part of the test which was there before, I don't
know if that can be moved, but I will take a look.
 
> This doesn't address HTTP/HTTPS/FTP/Gopher proxies, is that a problem?

It does not need to. Only those proxys are relevant which delegate the name
resolution to the server, all other cases are addressed by the code in
nsDNSService2.
Comment 34 Christian :Biesinger (don't email me, ping me on IRC) 2005-08-27 09:44:54 PDT
>It does not need to. Only those proxys are relevant which delegate the name
>resolution to the server

the ones I mentioned do delegate name resolution to the server...
Comment 35 Andreas Otte 2005-08-27 11:31:48 PDT
(In reply to comment #34)
> >It does not need to. Only those proxys are relevant which delegate the name
> >resolution to the server
> 
> the ones I mentioned do delegate name resolution to the server...

Where in the code does this show? The only place I could find was in
nsSocketTransport2.cpp with SOCKS5, everything else seems to be resolved locally.
  
Comment 36 Andreas Otte 2005-08-27 11:43:07 PDT
Created attachment 194047 [details] [diff] [review]
patch updated with cbiesingers comments, also includes the unescaping of the hostname in nsStandardURL.cpp

This
Comment 37 Andreas Otte 2005-08-27 11:45:54 PDT
Oops .. this patch still uses NS_ERROR_UNKNOWN_HOST, using a new error code
would be a huge undertaking, just look where NS_ERROR_UNKNOWN_HOST is used in
the code.
Comment 38 Christian :Biesinger (don't email me, ping me on IRC) 2005-08-27 12:08:11 PDT
(In reply to comment #35)
> Where in the code does this show? The only place I could find was in
> nsSocketTransport2.cpp with SOCKS5, everything else seems to be resolved locally.

I think this is in nsSocketTransport2.cpp ResolveHost, which uses SocketHost()
for the hostname, which is the proxy host for the mentioned cases. note
nsSocketTransport::Init:

    if (proxyInfo) {
        mProxyPort = proxyInfo->Port();
        mProxyHost = proxyInfo->Host();
Comment 39 Andreas Otte 2005-08-27 12:29:58 PDT
(In reply to comment #38)
> I think this is in nsSocketTransport2.cpp ResolveHost, which uses SocketHost()
> for the hostname, which is the proxy host for the mentioned cases. note
> nsSocketTransport::Init:
> 
>     if (proxyInfo) {
>         mProxyPort = proxyInfo->Port();
>         mProxyHost = proxyInfo->Host();

I see. SocketHost() is defined as

const nsCString &SocketHost() { return (!mProxyHost.IsEmpty() &&
!mProxyTransparent) ? mProxyHost : mHost; }

So we resolve the proxyhost if mProxyHost is not empty and the proxy is not
transparent, in all other cases we resolve the real host (unless the proxy
supports remoteDNS). The last case I've alredy got.

So if (!mProxyHost.IsEmpty() && !mProxyTransparent) we also need to check mHost.

Comment 40 Andreas Otte 2005-08-27 13:28:17 PDT
Created attachment 194053 [details] [diff] [review]
patch version 2.3

This version also finds the other cases where we use a proxy.
Comment 41 Christian :Biesinger (don't email me, ping me on IRC) 2005-08-27 15:01:15 PDT
Comment on attachment 194053 [details] [diff] [review]
patch version 2.3

hm, could you diff with both -p and -u? :)

I'm slightly concerned that this will make some valid, or at least used,
hostnames invalid... but maybe we can wait until we get bug reports.


nsSocketTransport2.cpp

+	  NS_UnescapeURL(mHost.get(), mHost.Length(),
+			 esc_AlwaysCopy, unescapedHost);

there's a version of NS_UnescapeURL that takes a string...


--- netwerk/base/src/nsStandardURL.cpp	27 Aug 2005 20:24:49 -0000
!     // this function returns PR_TRUE iff it writes something to |result|.
!     // this function returns PR_TRUE if it writes something to |result|.

why this change?



!     NS_UnescapeURL(PromiseFlatCString(host).get(), host.Length(),
!					esc_AlwaysCopy, unescapedHost);

that second line seems wrongly indented.


So with the standardurl change... is the unescaping in the DNS Service still
needed? I think one could make a case that callers should unescape if necessary
and that DNS service shouldn't have to. I guess that may also need simple uri
changes, then.

It looks like all your unescapeurl calls misindent the second line...


I don't understand the ACE problem with moving this to nsHostResolver... isn't
the host already in ACE form by the time the host resolver sees it?
Comment 42 Andreas Otte 2005-08-27 15:46:52 PDT
(In reply to comment #41)
> (From update of attachment 194053 [details] [diff] [review] [edit])
> hm, could you diff with both -p and -u? :)

I will do next time ...
 
> I'm slightly concerned that this will make some valid, or at least used,
> hostnames invalid... but maybe we can wait until we get bug reports.
 
There is a slight risk there, but in pushing this check all the way to or just
before the resolver, I think it is very slim. I constructed the set of invalid
characters by using the reserved set of characters from rfc 3986 minus "[]:"
(for IP addresses) and adding "%\".  

We could do an even better check by looking for the correct syntax of the
hostname (IP4/IP6/Hostname containing '.', '-', digits and numbers) but that
would increase the risk.

> nsSocketTransport2.cpp
> 
> +	  NS_UnescapeURL(mHost.get(), mHost.Length(),
> +			 esc_AlwaysCopy, unescapedHost);
> 
> there's a version of NS_UnescapeURL that takes a string...

I don't want to use that version, because it uses nsUnescapeCount internaly and
I trust this relict from the NN4 codebase not one bit.
 
> --- netwerk/base/src/nsStandardURL.cpp	27 Aug 2005 20:24:49 -0000
> !     // this function returns PR_TRUE iff it writes something to |result|.
> !     // this function returns PR_TRUE if it writes something to |result|.
> 
> why this change?
 
It looked like a typo to me, I leave it in. 

> !     NS_UnescapeURL(PromiseFlatCString(host).get(), host.Length(),
> !					esc_AlwaysCopy, unescapedHost);
> 
> that second line seems wrongly indented.

I will fix that. 

> So with the standardurl change... is the unescaping in the DNS Service still
> needed? I think one could make a case that callers should unescape if necessary
> and that DNS service shouldn't have to. I guess that may also need simple uri
> changes, then.

Yes, that would be needed and possibly more. I think the resolver should do it
just in case, just to be sure. The unescaping in nsStandardURL is also needed,
because even if resolving correctly without the unescape, the request is wrong.
I stumbled about that when trying to figure out why the long escaped url from
comment #10 did not work. It does now.

> It looks like all your unescapeurl calls misindent the second line...
> 
> 
> I don't understand the ACE problem with moving this to nsHostResolver... isn't
> the host already in ACE form by the time the host resolver sees it?

Yes, I will look at it again. I wanted to move more than the check into
nsHostResolver, but that didn't work.
Comment 43 Andreas Otte 2005-08-27 16:22:11 PDT
Created attachment 194065 [details] [diff] [review]
version 2.4 (diff -u and diff -p concatenated)

fixed identions and moved check into nsHostResolver
Comment 44 Christian :Biesinger (don't email me, ping me on IRC) 2005-08-27 16:28:51 PDT
[iff -> if]
>> why this change?
> It looked like a typo to me, I leave it in. 

nah, "iff" here means "if and only if"
Comment 45 Christian :Biesinger (don't email me, ping me on IRC) 2005-08-27 16:29:57 PDT
oh, also: when speaking about NS_UnescapeURL, I meant this variant:
http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsEscape.h#174
174 inline const nsACString &
175 NS_UnescapeURL(const nsASingleFragmentCString &str, PRUint32 flags,
nsACString &result) {
176     const char *temp;
177     if (NS_UnescapeURL(str.BeginReading(temp), str.Length(), flags, result))
178         return result;
179     return str;
180 }
Comment 46 Christian :Biesinger (don't email me, ping me on IRC) 2005-08-27 16:33:28 PDT
Comment on attachment 194065 [details] [diff] [review]
version 2.4 (diff -u and diff -p concatenated)

(this isn't a -u diff)
Comment 47 Christian :Biesinger (don't email me, ping me on IRC) 2005-08-27 17:30:07 PDT
I remember why I didn't use nsRefPtr... because it didn't work:
nsObjectLoadingContent doesn't implement any AddRef/Release functions, and the
ones it inherits are ambiguous.
Comment 48 Christian :Biesinger (don't email me, ping me on IRC) 2005-08-27 17:34:05 PDT
(In reply to comment #47)
> I remember why I didn't use nsRefPtr... because it didn't work:
> nsObjectLoadingContent doesn't implement any AddRef/Release functions, and the
> ones it inherits are ambiguous.

um, sorry, wrong bug.
Comment 49 Boris Zbarsky [:bz] 2005-08-28 15:42:27 PDT
> version 2.4 (diff -u and diff -p concatenated)

You want "diff -up8" I think...
Comment 50 Boris Zbarsky [:bz] 2005-08-28 15:48:01 PDT
Comment on attachment 194065 [details] [diff] [review]
version 2.4 (diff -u and diff -p concatenated)

>Index: netwerk/base/src/nsURLHelper.cpp
>+ net_IsValidHostName(const char *hostname, PRUint32 hostnameLen)
>+ {
>+     const char* end = hostname+hostnameLen; 

Spaces around the '+', please.

>Index: netwerk/base/src/nsStandardURL.cpp

>!     NS_UnescapeURL(PromiseFlatCString(host).get(), host.Length(),
>!                                       esc_AlwaysCopy, unescapedHost);

Fix the indent here, please....

I don't really know the ACE stuff enough to meaningfully review the rest of
this change, so I hope Darin will look at it before we ship it.  ;)
Comment 51 Andreas Otte 2005-08-28 22:47:27 PDT
Okay, I incorporated the latest review comments. Same procedure as last time. I
will checkin into the trunk. 
Comment 52 Andreas Otte 2005-08-28 22:59:57 PDT
Created attachment 194146 [details] [diff] [review]
for reverence only, current patch after review comments
Comment 53 Gervase Markham [:gerv] 2005-08-29 09:02:11 PDT
It's hard to know from reading the comments what the current patch actually does
- can someone summarise?

It may be relevant to point out that we already have a character blacklist for
domain names, which currently contains a couple of homographs of "/" and is
being expanded over in bug 301694.

Gerv
Comment 54 Christian :Biesinger (don't email me, ping me on IRC) 2005-08-29 10:09:27 PDT
The latest patch here disallows:
/?#@!$&'()*+,;=%\

Hm... I think + should be removed from that list, possible & as well
Comment 55 Andreas Otte 2005-08-29 11:07:40 PDT
This patch:

1. checks for a list of blacklisted ascii characters just before hostname
resolution. At that point the hostname is

- unescaped
- if UTF8 converted to punycode

So it contains only ascii characters at this point which makes the list rather
small.

2. checks for a list of blacklisted ascii characters when we use a proxy and do
no name resolution. Same conditions as with 1.

3. does an additional unescape before the ACE/UTF8 checks to catch hexencoded
UTF8 hostnames.
Comment 56 Andreas Otte 2005-08-29 11:09:22 PDT
(In reply to comment #54)
> The latest patch here disallows:
> /?#@!$&'()*+,;=%\
> 
> Hm... I think + should be removed from that list, possible & as well

Why? Do you have a case where this characters are part of the hostname off an
url that is actually resolved?
Comment 57 Gervase Markham [:gerv] 2005-08-29 11:10:38 PDT
Hold it a sec, chaps - we probably need to coordinate a little more closely
here. For a start, we need to check whether the IDN and non-IDN lists can be the
same check; secondly, Opera have some experience in this area that we should
leverage.

I have to go out now; I'll write more later.

Gerv
Comment 58 Christian :Biesinger (don't email me, ping me on IRC) 2005-08-29 13:09:28 PDT
yes, I seem to recall a hostname that contained a +. unfortunately I forgot
which one that was...
Comment 59 Andreas Otte 2005-08-29 13:28:55 PDT
I found this about hostnames:


[idn] hostname history hell

by "Eric A. Hall" <ehall@ehsco.com>


The earliest definition I know of for hostnames is rfc608 from 1974.

| in which <host-name> will be the official Host Name, a
| string obtained through negotiation between the Host and the
| NIC, governed by these constraints:
| 
|    up to 48 characters drawn from the alphabet (A-Z),
|    digits (0-9), and the minus sign (-) ... specifically,
|    no blank or space characters allowed;
| 
|    no distinction between upper and lower case letters;
| 
|    the first character is a letter;
| 
|    the last character is NOT a minus sign;
| 
|    no other restrictions on content or syntax.

Minimum 1 alpha character, maximum 48 alphanumeric chars, first char alpha
(ergo not hyphen), last char alphanumeric. The exclusion of beginning and
ending hyphens had to do with syntax of multi-part service-specific names
as is more clear later.

One side-note is that hyphen is one of the only punctuation characters
that is common between all implementations of the ECMA-6 charset.

Updated by rfc810

| 1. A "name" (Net, Host, Gateway, or Domain name) is a text string up
| to 24 characters drawn from the alphabet (A-Z), digits (0-9), and the
| minus sign (-) and period (.).  No blank or space characters are
| permitted as part of a name.  No distinction is made between upper
| and lower case.  The first character must be a letter.  The last
| character must not be a minus sign or period.  A host which serves as
| a GATEWAY should have "-GATEWAY" or "-GW" as part of its name.  A
| host which is a TIP or a TAC should have  "-TIP" or "-TAC" as part of
| its host name, if it is an ARPANET or DoD host.

Minimum 1 alpha character, maximum 24 alphanumeric chars (half the
original length from rfc608), first char alpha (ergo not hyphen), last
char alphanumeric. No explicit FQDN lengths defined.

Updated by rfc952

| 1. A "name" (Net, Host, Gateway, or Domain name) is a text string up
| to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus
| sign (-), and period (.).  Note that periods are only allowed when
| they serve to delimit components of "domain style names". (See
| RFC-921, "Domain Name System Implementation Schedule", for
| background).  No blank or space characters are permitted as part of a
| name. No distinction is made between upper and lower case.  The first
| character must be an alpha character.  The last character must not be
| a minus sign or period.  A host which serves as a GATEWAY should have
| "-GATEWAY" or "-GW" as part of its name.  Hosts which do not serve as
| Internet gateways should not use "-GATEWAY" and "-GW" as part of
| their names. A host which is a TAC should have "-TAC" as the last
| part of its host name, if it is a DoD host.  Single character names
| or nicknames are not allowed.

Minimum 2 alphanumeric characters (double the original minimum), maximum
24 alphanumeric chars, first char alpha (ergo not hyphen), last char
alphanumeric. No explicit FQDN lengths defined.

All of the above specifies host names as they appear in hosts databases.
These rules were often enforced by /etc/hosts implementations as well (eg,
some unix systems have maximum entry lengths of 24 chars in the local
/etc/hosts database).

rfc1035 expanded the hostname rules implicitly

| However, when assigning a domain name for an object, the prudent user
| will select a name which satisfies both the rules of the domain system
| and any existing rules for the object, whether these rules are published
| or implied by existing programs.

| The labels must follow the rules for ARPANET host names.  They must
| start with a letter, end with a letter or digit, and have as interior
| characters only letters, digits, and hyphen.  There are also some
| restrictions on the length.  Labels must be 63 characters or less.

Note that rfc1035 allows labels (and thus hostnames) to have minimum
length of 1 character, with a maximum of 63 characters for each label, and
with a fully-qualified domain name (and thus hostnames) to be 255
characters including the separators.

Minimum 2 alphanumeric character (inherited from rfc952), maximum 63
alphanumeric chars, first char alpha (ergo not hyphen), last char
alphanumeric. Maximum 255 for FQDN.

Some unification and clarification by rfc1123

| The syntax of a legal Internet host name was specified in RFC-952
| [DNS:4].  One aspect of host name syntax is hereby changed: the
| restriction on the first character is relaxed to allow either a
| letter or a digit.  Host software MUST support this more liberal
| syntax.

[first char alphanumeric, but not hyphen]

| Host software MUST handle host names of up to 63 characters and
| SHOULD handle host names of up to 255 characters.

[unified label length of 63 max, unified FQDN of 255, no unified minimum
length so minimum of 2 inherited from rfc952 prevails]

| Whenever a user inputs the identity of an Internet host, it SHOULD
| be possible to enter either (1) a host domain name or (2) an IP
| address in dotted-decimal ("#.#.#.#") form.  The host SHOULD check
| the string syntactically for a dotted-decimal number before
| looking it up in the Domain Name System.
| 
| DISCUSSION:
|      This last requirement is not intended to specify the complete
|      syntactic form for entering a dotted-decimal host number;
|      that is considered to be a user-interface issue.  For
|      example, a dotted-decimal number must be enclosed within
|      "[ ]" brackets for SMTP mail (see Section 5.2.17).  This
|      notation could be made universal within a host system,
|      simplifying the syntactic checking for a dotted-decimal
|      number.
|      
|      If a dotted-decimal number can be entered without such
|      identifying delimiters, then a full syntactic check must be
|      made, because a segment of a host domain name is now allowed
|      to begin with a digit and could legally be entirely numeric
|      (see Section 6.1.2.4).  However, a valid host name can never
|      have the dotted-decimal form #.#.#.#, since at least the
|      highest-level component label will be alphabetic.

Note that the last sentence says that the trailing label must contain
alpha chars, probably under the assumption that TLDs will be used. The
rule is not explicity stated but it is implied strongly.

Minimum 2 alphanumeric characters, maximum 63 alphanumeric chars, first
char alphanumeric, last char alphanumeric. Maximum 255 for FQDN. FQDN must
contain at least one alpha.

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/





no '+' allowed by any RFC on the topic and I have not seen it anywhere. '_'
seems to be used sometimes in violation of the standards, I left it off the list. 
Comment 60 Asa Dotzler [:asa] 2005-08-29 15:10:50 PDT
doesn't look like this is going to make beta1. We'll re-evaluate the risk in the
beta2 cycle.
Comment 61 Darin Fisher 2005-08-30 18:54:35 PDT
For the branch I'd be happy with simply rejecting hostnames that contain '%'
characters.  I'm leary about unescaping the %-encoded hostnames.  Firefox 1.0
does not unescape '%' characters, and we've barely received any complaints about
that.

In other words, why don't we just take the first patch for the 1.8 branch?

Also, I checked that the squid proxy server does not accept escaped characters
that form a valid IDN (even when all of the escaped characters are simple ASCII
alpha characters).  So, it seems to me that unescaping %-escaped IDN can wait.
Comment 62 Andreas Otte 2005-08-30 22:23:31 PDT
Darin, the first patch completly breaks thunderbird and seamonkey-mail because
of the internaly used mailbox-urls that contain '%'. You tried the same thing in
bug 110938 some time ago.

We can loose the change in nsStandardURL.cpp for the branch, those %-encoded
IDN-URLs are rare, but they are mentioned in the RFCs, they seem to exist. sqiud
seems to be broken the same way mozilla currently is.

And then there are the objections of gerv, because there is an effort to create
a unicode based black-/whitelist which is expanded in bug 301694. I don't think
that effort is the right choice for the problems described in this bug, but
maybe there is synergie possible.
Comment 63 Andreas Otte 2005-08-30 23:00:09 PDT
I guess we could also loose the unescapes before every check with
IsValidHostName for the branch if they scare you. 
Comment 64 Darin Fisher 2005-08-31 11:43:39 PDT
Ah right... the mailbox URLs :-(  How about instead of rejecting the URL at the
parser level, we reject it as the DNS level?  We could add code to
nsDNSService2.cpp (or nsHostResolver.cpp) that scans the hostname and rejects
invalid characters.
Comment 65 Andreas Otte 2005-08-31 11:59:22 PDT
That is exactly what this patch does :-)
Comment 66 Andreas Otte 2005-08-31 14:50:18 PDT
Created attachment 194475 [details] [diff] [review]
toned down version of the patch for the 1.8 branch

This patch for the 1.8 branch has no unescapes and the list of invalid
characters is smaller.
Comment 67 Gervase Markham [:gerv] 2005-09-01 02:44:08 PDT
Opera 7.5 uses the following (courtesy of Yngve Pettersen):

"The list used in 7.5x is:
   0021-0023;
   0025-002C;
   002F; Forward slash
   003B-0040;
   005C;
   005E;
   007B-007D;
   007F;

"%" and "/" are the most important of these, since they can impact both  visual
and machine interpretation of a URI, especially if the servername  is converted,
and then later parsed again.

The others were added because they are not (or should not be) valid in a  DNS
name. (Hmmm, on second thought, not sure U+002B "+" should be on the  list)"

Gerv
Comment 68 Andreas Otte 2005-09-01 04:54:07 PDT
Those are more or less the characters I used, Opera is even more restrictive. +
should not be in a hostname, excluding it is correct, but biesi remembers having
seen at least one host with a +, but does not remember where and when. 

gerv: Where is the check from bug 301694 located, where does it happen?
 
Comment 69 Gervase Markham [:gerv] 2005-09-01 05:51:59 PDT
Andreas: as I said in that bug, read the patch attached to bug 283016, which
shows exactly what code there is and where :-)

As for "+", I don't know if Yngve took it out of the list for Opera 8 or not. I
can email him and ask if you like. But I figure if we ban it, people will stop
using it pretty quickly.

Gerv
Comment 70 Andreas Otte 2005-09-01 10:08:32 PDT
(In reply to comment #69)
> Andreas: as I said in that bug, read the patch attached to bug 283016, which
> shows exactly what code there is and where :-)

Ooops, I missed that being at work and only looking at the one bug number I
remembered ... 
 
> As for "+", I don't know if Yngve took it out of the list for Opera 8 or not. I
> can email him and ask if you like. But I figure if we ban it, people will stop
> using it pretty quickly.

I think we should block it.

Comment 71 Andreas Otte 2005-09-01 13:17:23 PDT
Created attachment 194572 [details] [diff] [review]
This is the deluxe version of the patch

This version has an extended list of invalid characters in sync with Opera and
its own error code NS_ERROR_INVALID_HOST with error texts.
Comment 72 Christian :Biesinger (don't email me, ping me on IRC) 2005-09-06 09:53:22 PDT
Comment on attachment 194572 [details] [diff] [review]
This is the deluxe version of the patch

mozilla/browser/locales/en-US/chrome/overrides/netError.dtd
+<p>The provided hostname is not in a recognized format. If you clicked on a
link most likely someone tries to trick you, otherwise check the location bar
for mistakes and try again.</p>


that should probably be wrapped a bit.

looks good otherwise, r=biesi
Comment 73 Andreas Otte 2005-09-07 10:01:28 PDT
Do I need an additional review for the changes in
mozilla/browser/locales/en-US/chrome/overrides? 

From Ben Goodger for example?
Comment 74 :Gavin Sharp [email: gavin@gavinsharp.com] 2005-09-07 12:55:07 PDT
(In reply to comment #73)
> Do I need an additional review for the changes in
> mozilla/browser/locales/en-US/chrome/overrides?

No, per http://www.mozilla.org/projects/firefox/review.html, point #1.
Comment 75 Andreas Otte 2005-09-08 12:12:45 PDT
Created attachment 195305 [details] [diff] [review]
supplementing the last patch this version of net_IsValidHostName includes all controlcharacters < \x20
Comment 76 Christian :Biesinger (don't email me, ping me on IRC) 2005-09-08 14:52:41 PDT
Comment on attachment 195305 [details] [diff] [review]
supplementing the last patch this version of net_IsValidHostName includes all controlcharacters < \x20

hrm, ok... but this is pretty unreadable...
Comment 77 Andreas Otte 2005-09-08 15:12:39 PDT
(In reply to comment #76)
> (From update of attachment 195305 [details] [diff] [review] [edit])
> hrm, ok... but this is pretty unreadable...

That's why I put the comment line with the readable characters above the real
string with the unreadable stuff. Do you have a better way to have all these
control characters in a string and have it still readable? This way it is at
least consistent.
Comment 78 Christian :Biesinger (don't email me, ping me on IRC) 2005-09-08 15:43:39 PDT
I have no good suggestion, no. maybe darin does :)
Comment 79 earthsound 2005-09-09 10:56:22 PDT
It has been a chore trying to follow the comments here, but I wanted to mention
that RFC3490 (http://www.ietf.org/rfc/rfc3490.txt), specifically sections "4.
Conversion operations" through "4.2 ToUnicode" (pages 9-11), should help with
this bug, especially for those who want to brush up on the subject.
Comment 80 Andreas Otte 2005-09-13 12:52:39 PDT
Ok, I think time is up for me on this one. Even if a superreview arrives now,
checking something into the tree by myself so close to my vacation seems unwise.
If you want something of this to happen for beta2, somebody else has to do it. 

What happend so far: 

- the first take (catching % in the URL-Parser) was a bad idea because of the
mailbox urls.

- the second take was to catch % and other bad characters in the hostname
shortly before DNS resolution. This approach went through several iterations, in
the process I found a whole class of hostnames (%-encoded UTF8 hostnames) we
currently are unable to handle (but are mentioned in RFC 3986 (3.2.2)).

The patches on this bug do the following:

1. check for a list of blacklisted ascii characters just before hostname
resolution. At that point the hostname is

- unescaped
- if UTF8 converted to punycode

So it contains only ascii characters at this point which makes the list to check
against rather small.

2. check for a list of blacklisted ascii characters when we use a proxy and do
no name resolution. Same conditions as with 1.

3. do an additional unescape in nsStandardURL before the ACE/UTF8 checks to
catch %-encoded UTF8 hostnames.


There are now three different valid patches for this bug:

1) reviewed/superreviewed version for the trunk. This version does what is
listed above, but throws an NS_ERROR_UNKNOWN_HOST which is actually only an
approximation for the real error. Because of this I don't want to have it on the
trunk.

https://bugzilla.mozilla.org/attachment.cgi?id=194146


2) conserative branch version. This version does no unescapes and has a reduced
list of invalid characters. Also uses NS_ERROR_UNKNOWN_HOST. It will not be able
to handle %-encoded UTF8 hostnames.

https://bugzilla.mozilla.org/attachment.cgi?id=194475


3) latest version for the trunk. This version has an extended list of invalid
characters equal to Opera and its own error code NS_ERROR_INVALID_HOST with
error text hinting on phishing. Also it does the unescapes to handle %-encoded
UTF8 hostnames.

https://bugzilla.mozilla.org/attachment.cgi?id=194572


There also is one more patch,

https://bugzilla.mozilla.org/attachment.cgi?id=195305

which adds control-characters to the list of invalid characters for hostnames
(essential when we unescape hostname). This patch can be used as replacement in
any of the avove three patches for the changes to nsURLHelper.cpp.


I would like to have patch 3 (pending superreview, augmented by patch 4) on the
trunk and if it behaves have patch 1 (augmented by patch 4) on the branch for
(1.8b5).
Comment 81 Andreas Otte 2005-09-13 14:58:30 PDT
Created attachment 195952 [details] [diff] [review]
updated deluxe patch to prevent bitrot, also contains the supplement patch
Comment 82 Andreas Otte 2005-09-13 15:34:58 PDT
Created attachment 195959 [details] [diff] [review]
updated patch version for the 1.8 branch to prevent bitrot
Comment 83 Andreas Otte 2005-09-13 23:00:05 PDT
changing owner
Comment 84 Darin Fisher 2005-09-14 17:31:15 PDT
Andreas:  Thanks for the patches.  I'm sorry I wasn't able to review them before
your vacation.  I'll work on getting the minimal patch into the tree for Firefox
1.5.
Comment 85 Darin Fisher 2005-09-22 13:07:51 PDT
I'm curious why version 2 treats '/' as an invalid character?  Since version 2
does not introduce unescaping, it would seem that there is no way for a '/' to
end up in a hostname extracted from an URL.  Is the check for '/' merely an
extra level of precaution in case parsing logic upstream ever changes?
Comment 86 Darin Fisher 2005-09-22 14:03:51 PDT
Created attachment 197088 [details] [diff] [review]
Revised final patch for the 1.8 branch

This is Andreas' minimal patch for the 1.8 branch.  I applied my review nits to
the patch, and so I'd just like one more set of eyes on this patch before I
check it in.  Biesi: would you please do the honors?
Comment 87 Darin Fisher 2005-09-22 14:40:20 PDT
fixed-on-trunk

Let's move the enhancements that we want on the trunk into another bug.
Comment 88 Darin Fisher 2005-09-22 14:48:16 PDT
See bug 309671 and bug 309672 for followup work.
Comment 89 Tracy Walker [:tracy] 2005-09-23 10:43:02 PDT
Trying to verify this with the % laden URL in comment #10. I get a Problem
loading page "Server not found" message page. Is that what is supposed to happen
or should  I have been taken to the same page as the second URL in comment #10?
Comment 90 Darin Fisher 2005-09-23 11:57:51 PDT
(In reply to comment #89)
> Trying to verify this with the % laden URL in comment #10. I get a Problem
> loading page "Server not found" message page. Is that what is supposed to happen
> or should  I have been taken to the same page as the second URL in comment #10?

Tracy, what you are observing is correct and expected behavior.  For Firefox
1.5, we are not going to be unescaping % characters found in the hostname part
of the URL.  (See bug 309671.)
Comment 91 Darin Fisher 2005-09-23 12:36:23 PDT
fixed1.8
Comment 92 Jesse Ruderman 2006-06-03 20:15:30 PDT
Bug 340276 makes it sound like this might have regressed...
Comment 93 Wayne Mery (:wsmwk, NI for questions) 2009-03-31 14:20:43 PDT
*** Bug 205456 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.