Closed Bug 309671 Opened 19 years ago Closed 9 years ago

Support %-escaped hostnames per RFC 3986 (3.2.2) / Cannot open IDN from other applications(e.g., from Thunderbird)

Categories

(Core :: Networking, defect, P3)

defect

Tracking

()

RESOLVED DUPLICATE of bug 412457
mozilla1.9

People

(Reporter: darin.moz, Unassigned)

References

(Depends on 1 open bug, Blocks 2 open bugs, )

Details

(Keywords: helpwanted, intl, Whiteboard: [has-patch])

Attachments

(6 files, 8 obsolete files)

1.66 KB, patch
andreas.otte
: review+
Biesinger
: review+
Details | Diff | Splinter Review
1.94 KB, patch
jshin1987
: review-
Details | Diff | Splinter Review
7.81 KB, patch
beltzner
: review+
neil
: review+
Details | Diff | Splinter Review
1.01 KB, text/html
Details
17.95 KB, patch
masayuki
: review+
Biesinger
: superreview-
Details | Diff | Splinter Review
10.65 KB, patch
Details | Diff | Splinter Review
Support %-escaped hostnames per RFC 3986 (3.2.2)

See discussion and patches in bug 304904.
Keywords: helpwanted
Target Milestone: --- → mozilla1.9alpha
this is actually a duplicate of bug 43659... not marking as such, because this
one has a patch.
Blocks: 43659
*** Bug 311306 has been marked as a duplicate of this bug. ***
Comment on attachment 198453 [details] [diff] [review]
merged patch for bug 309672 and 309671

requesting reviews but be aware that this patch is actually for bugs 309671 and
309672
Attachment #198453 - Flags: superreview?(darin)
Attachment #198453 - Flags: review?(cbiesinger)
Attachment #199605 - Flags: superreview?(darin)
Attachment #199605 - Flags: review?(cbiesinger)
Comment on attachment 198453 [details] [diff] [review]
merged patch for bug 309672 and 309671

mozilla/netwerk/dns/src/nsDNSService2.cpp
+    nsCAutoString unescapedHost; 
+    NS_UnescapeURL(PromiseFlatCString(hostname).get(), hostname.Length(), 
+		    esc_AlwaysCopy, unescapedHost);
+
+    const nsACString *hostPtr = &unescapedHost;

hm, if you have a pointer for the host anyway, seems like you should be able to
avoid the AlwaysCopy pretty easily, right?

mozilla/netwerk/base/src/nsURLHelper.cpp
hmm, I reviewed this in another patch already, didn't I?

mozilla/netwerk/base/src/nsStandardURL.cpp
+    NS_UnescapeURL(PromiseFlatCString(host).get(), host.Length(),
+		    esc_AlwaysCopy, unescapedHost);

why not use this variant of NS_UnescapeURL?
175 NS_UnescapeURL(const nsASingleFragmentCString &str, PRUint32 flags,
nsACString &result) {

mozilla/netwerk/base/src/nsSocketTransport2.cpp
+	     NS_UnescapeURL(mHost.get(), mHost.Length(),
+			    esc_AlwaysCopy, unescapedHost);

same here

r=biesi
Attachment #198453 - Flags: review?(cbiesinger) → review+
Comment on attachment 199605 [details] [diff] [review]
patch for the 1.8 branch, %-escape support only

r=biesi with the same comments as above
Attachment #199605 - Flags: review?(cbiesinger) → review+
Attachment #198453 - Attachment is obsolete: true
Attachment #199605 - Attachment is obsolete: true
Attachment #198453 - Flags: superreview?(darin)
Attachment #199605 - Flags: superreview?(darin)
Comment on attachment 200546 [details] [diff] [review]
updated patch (again a combined version for bugs 309672 and bug 309671)

carrying over biesis review
Attachment #200546 - Flags: superreview?(darin)
Attachment #200546 - Flags: review+
Comment on attachment 200546 [details] [diff] [review]
updated patch (again a combined version for bugs 309672 and bug 309671)

or not ...
Attachment #200546 - Flags: review+ → review?(cbiesinger)
Given the decision in bug 43659 not to fix this for security reasons, are we sure we want to do this now?
RFC 3986 says we should support them. In combination with the work done in bug 304904 and some others (extending the list of blacklisted characters) I think it might be possible to do this now. Of course it might be necessary to extend the blacklist even further. 
addition to the patch above, converting the blacklist to a whitelist, making the hostname check even more restrictive
Attachment #200632 - Flags: superreview?(darin)
Attachment #200632 - Flags: review?(cbiesinger)
That's an enormous improvement. A whitelist-based approach is _much_ better than one based on a blacklist, and the whitelist is human-readable, unlike the previous blacklist, which greatly improves maintainability. 

But why are the characters "[]_" in the whitelist? Can you give me any examples of domain names in practical use that contain these characters, or am I missing something?
OK, I can see from reading that some people may well have underscores in their non-RFC-compliant domain names, and that it's probably harmless to allow them in order not to break anything for these users.

Are the square brackets for domain-literals? 


The brackets are for the IP6-IP-number, the hostname representation in URLs includes then as well as the colon.
Ah. I knew about the use of colons in v6 IP addresses, but I didn't know about the use of square brackets when embedding them in URLs. Now I do.

However, reading RFC 2732 ("Format for Literal IPv6 Addresses in URL's"), I get the clear impression that the brackets are part of the URL syntax, rather than part of the IP address representation itself. Taking the brackets off in URL parsing, before sending the IP address to the DNS code, would eliminate two more characters from the whitelist (a Good Thing), and eliminate the need for extra logic checking against their misuse within the DNS code itself (we probably don't want to send "[[[[.]]]].[]" to a DNS server).
It seems to me that at the stage where we test the hostname (for example if we don't do the DNS lookup ourselves in a proxy situation) the brackets are still there, but I may be wrong here.
Comment on attachment 200632 [details] [diff] [review]
converting blacklist to whitelist

r=biesi
Attachment #200632 - Flags: review?(cbiesinger) → review+
Comment on attachment 200546 [details] [diff] [review]
updated patch (again a combined version for bugs 309672 and bug 309671)

r=biesi

what about these places, don't they need changes for the new error code too:
/directory/xpcom/base/src/nsLDAPConnection.cpp
/extensions/irc/xul/content/handlers.js, line 1757 -- case NS_ERROR_UNKNOWN_HOST:
/mailnews/** (various)
Attachment #200546 - Flags: review?(cbiesinger) → review+
I too looked through all the places where NS_ERROR_UNKNOWN_HOST is currently used, I focused on the browser where this kind of phishing attempts are common. In the other cases a generic error is thrown, it seemed good enough to me should such an error (NS_ERROR_INVALID_HOST) ever encountered in this places in the code.
Comment on attachment 200632 [details] [diff] [review]
converting blacklist to whitelist

>+    // if one of the chars in the host 
>+    // is not one of the following return false
>+    return net_FindCharNotInSet(host.BeginReading(), end,
>+                                ".-0123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ"
>+                                "[]_abcdefghijklmnopqrstuvwxyz") == end;

This is much nicer to read.  Can you wrap the comment block to 80 chars?
I don't think the brackets should be needed.  The hostname passed to the
DNS service and a Socket Transport should all have had the brackets
stripped (in the case of an IPv6 literal).
Attachment #200632 - Flags: superreview?(darin) → superreview-
Comment on attachment 200546 [details] [diff] [review]
updated patch (again a combined version for bugs 309672 and bug 309671)

>Index: mozilla/browser/locales/en-US/chrome/overrides/appstrings.properties

>+invalidHost=%S seems to be an invalid hostname.


>Index: mozilla/browser/locales/en-US/chrome/overrides/netError.dtd

>+<!ENTITY invalidHost.title "Hostname invalid">
>+<!ENTITY invalidHost.longDesc "
>+<p>The provided hostname is not in a recognized format. If you clicked on a 
>+link most likely someone tries to trick you, otherwise check the location 
>+bar for mistakes and try again.</p>
>+">

Grammatical nits:

  "If you clicked on a link, most likely someone is trying to trick you.
   Otherwise, check the location bar for mistakes, and try again."

Perhaps the version in appstrings.properties should match the first
sentence from netError.dtd?

Also, I think we should ask beltzner to review this text.


>Index: mozilla/docshell/base/nsDocShell.cpp

>         NS_ENSURE_ARG_POINTER(aURI);
>         // Get the host
>         nsCAutoString host;
>         aURI->GetHost(host);
>         CopyUTF8toUTF16(host, formatStrs[0]);
>         formatStrCount = 1;
>         error.AssignLiteral("dnsNotFound");
>     }
>+    else if (NS_ERROR_INVALID_HOST == aError) {
>+        NS_ENSURE_ARG_POINTER(aURI);
>+        // Get the host
>+        nsCAutoString host;
>+        aURI->GetHost(host);
>+        CopyUTF8toUTF16(host, formatStrs[0]);
>+        formatStrCount = 1;
>+        error.AssignLiteral("invalidHost");

The code in the above block is almost identical.  Perhaps 
you should find a way to combine these so they do not
duplicate code so much.


>Index: mozilla/netwerk/base/public/nsNetError.h

> /**
>+ * The hostname is invalid.
>+ */
>+
>+#define NS_ERROR_INVALID_HOST \
>+    NS_ERROR_GENERATE_FAILURE(NS_ERROR_MODULE_NETWORK, 26)

"invalid" is such a vague description for this error code.  A hostname
that cannot be found via DNS could surely be termed invalid as well.
I think you need to specify more clearly what it means for a hostname
to be invalid.  In other words, I'm okay with the term "invalid" so
long as you provide ample documentation about what it means.  You might
want to say that it means that the hostname contains characters that
are considered invalid (i.e., to indicate that the hostname was rejected
before it was even sent to the DNS resolver).


>Index: mozilla/netwerk/base/src/nsStandardURL.cpp

> PRBool
> nsStandardURL::NormalizeIDN(const nsCSubstring &host, nsCString &result)
> {
>+    // First unescape the host to make it possibly UTF8, then check if
>+    // it is ASCII
>     // If host is ACE, then convert to UTF-8.  Else, if host is already UTF-8,
>     // then make sure it is normalized per IDN.

Please make form the comments as a complete paragraph with a period between
sentences:

      // First unescape the host to make it possibly UTF-8, then check if
      // it is ASCII.  If host is ACE, then convert to UTF-8.  Else, if host
      // is already UTF-8, then make sure it is normalized per IDN.
     

>+    // unescape 
>+    nsCAutoString unescapedHost;
>+    NS_UnescapeURL(host, esc_AlwaysCopy, unescapedHost);
>+
>+    // now check if it really is ASCII
>+    if (IsASCII(unescapedHost)) {

For performance reasons, I think we should avoid copying the hostname like
this since in most cases the hostname will not be escaped.  I realize that
the methods in nsEscape.h are not very helpful in this regard, so this is
fine for now.


>Index: mozilla/netwerk/dns/src/nsDNSService2.cpp

>     const nsACString *hostPtr = &hostname;
> 
>+    // first unescape the hostname to catch encodings 
>+    nsCAutoString unescapedHost; 
>+    if (NS_UnescapeURL(PromiseFlatCString(hostname).get(), hostname.Length(),  
>+                       0, unescapedHost)) {
>+        hostPtr = &unescapedHost;
>+    }

So, it turns out that you can just use hostname.BeginReading()
instead of PromiseFlatCString.  bsmedberg just added a version
of BeginReading to nsACString that returns a |const char *|.
NOTE: the result of BeginReading may not be null terminated.


>+    // first unescape the hostname to catch encodings 
>+    nsCAutoString unescapedHost; 
>+    if (NS_UnescapeURL(PromiseFlatCString(hostname).get(), hostname.Length(),  
>+                       0, unescapedHost)) {
>+        hostPtr = &unescapedHost;
>+    }

ditto: use BeginReading instead of PromiseFlatCString.


>Index: mozilla/dom/locales/en-US/chrome/appstrings.properties

>+invalidHost=%S is invalid. Please check the hostname and try again.

>Index: mozilla/dom/locales/en-US/chrome/netError.dtd

>+<!ENTITY invalidHost.title "Hostname Invalid">
>+<!ENTITY invalidHost.longDesc "<p>The provided hostname is not in a recognized format. If you clicked on a link most likely someone tries to trick you, otherwise check the location bar for mistakes and try again.</p>">
>+

Again, these strings need to be reviewed by Mike Beltzner.
Attachment #200546 - Flags: superreview?(darin) → superreview-
Hey Andreas, I took the liberty of hacking on net_IsValidHostName.  I think it might be more efficient to check character ranges instead since in most cases hostnames will be lowercase letters with a couple dots.  (We lowercase hostnames in nsStandardURL.)
Attachment #202439 - Flags: review?(andreas.otte)
Comment on attachment 202439 [details] [diff] [review]
converting blacklist to whitelist (v2)

looks great, much better than my whitelist approach
Attachment #202439 - Flags: review?(andreas.otte) → review+
*** Bug 315666 has been marked as a duplicate of this bug. ***
Comment on attachment 200546 [details] [diff] [review]
updated patch (again a combined version for bugs 309672 and bug 309671)

>  PRBool
>  nsStandardURL::NormalizeIDN(const nsCSubstring &host, nsCString &result)
>  {
> +    // First unescape the host to make it possibly UTF8, then check if
> +    // it is ASCII
>      // If host is ACE, then convert to UTF-8.  Else, if host is already UTF-8,
>      // then make sure it is normalized per IDN.
>  
>      // this function returns PR_TRUE iff it writes something to |result|.
>  
>      // NOTE: As a side-effect this function sets mHostEncoding.  While it would
>      // be nice to avoid side-effects in this function, the implementation of
>      // this function is already somewhat bound to the behavior of the
>      // callsites.  Anyways, this function exists to avoid code duplication, so
>      // side-effects abound :-/
>  
>      NS_ASSERTION(mHostEncoding == eEncoding_ASCII, "unexpected default encoding");
>  
> -    if (IsASCII(host)) {
> +    // unescape 
> +    nsCAutoString unescapedHost;
> +    NS_UnescapeURL(host, esc_AlwaysCopy, unescapedHost);
> +
> +    // now check if it really is ASCII
> +    if (IsASCII(unescapedHost)) {
>          PRBool isACE;
>          if (gIDN &&
> -            NS_SUCCEEDED(gIDN->IsACE(host, &isACE)) && isACE &&
> -            NS_SUCCEEDED(ACEtoDisplayIDN(host, result))) {
> +            NS_SUCCEEDED(gIDN->IsACE(unescapedHost, &isACE)) && isACE &&
> +            NS_SUCCEEDED(ACEtoDisplayIDN(unescapedHost, result))) {
>              mHostEncoding = eEncoding_UTF8;
>              return PR_TRUE;
>          }
>      }
>      else {
>          mHostEncoding = eEncoding_UTF8;
> -        if (gIDN && NS_SUCCEEDED(UTF8toDisplayIDN(host, result))) {
> +        if (gIDN && NS_SUCCEEDED(UTF8toDisplayIDN(unescapedHost, result))) {
>              // normalization could result in an ASCII only hostname
>              if (IsASCII(result))
>                  mHostEncoding = eEncoding_ASCII;
>              return PR_TRUE;
>          }
>      }

This code breaks "Copy Link Location" if the host is not encoded by UTF-8.
Please test with third case of attahcment 202340.
My patch of bug 315666 is not breaking it. See my patch that is attahment 202354. In this patch, I don't unescape the host name if it's not UTF-8 string. It's very important.
Sorry.

> Please test with third case of attahcment 202340.
> My patch of bug 315666 is not breaking it. See my patch that is attahment
> 202354.

Please test with third case of attachment 202340 [details].
My patch of bug 315666 is not breaking it. See my patch that is attachment
202354 [review].
This bug is not "enhancement". Because we cannot open IDN from other application without the patch of attahment 200546. And we(Mozilla Japan and JPRS) hope that let's go to 1.8.1.

"enhancement" -> "normal"
Severity: enhancement → normal
Flags: blocking1.8.1?
Keywords: intl
Summary: Support %-escaped hostnames per RFC 3986 (3.2.2) → Support %-escaped hostnames per RFC 3986 (3.2.2) / Cannot open IDN from other applications
why can't other applications pass the non-escaped string to the browser?
(In reply to comment #30)
> why can't other applications pass the non-escaped string to the browser?

# I don't confirmed the URL is escaped by Thunderbird or Firefox.
But I think that using escaped URL is better idea for non-Unicode plattforms(e.g., Windows). On Windows, we aren't using UTF-16 for communication between applications. Therefore, we cannot using all characters for IDN if we use non-escaped string.
Summary: Support %-escaped hostnames per RFC 3986 (3.2.2) / Cannot open IDN from other applications → Support %-escaped hostnames per RFC 3986 (3.2.2) / Cannot open IDN from other applications(e.g., from Thunderbird)
(In reply to comment #27)
> 
> This code breaks "Copy Link Location" if the host is not encoded by UTF-8.
> Please test with third case of attahcment 202340.
> My patch of bug 315666 is not breaking it. See my patch that is attahment
> 202354. In this patch, I don't unescape the host name if it's not UTF-8 string.
> It's very important.

I will look into it and incorporate your stuff 
(In reply to comment #32)
> (In reply to comment #27)
> > 
> > This code breaks "Copy Link Location" if the host is not encoded by UTF-8.
> > Please test with third case of attahcment 202340.
> > My patch of bug 315666 is not breaking it. See my patch that is attahment
> > 202354. In this patch, I don't unescape the host name if it's not UTF-8 string.
> > It's very important.
> 
> I will look into it and incorporate your stuff 
> 

I can get it to show a correct representation in the status bar, but resolving this host does not work. Is it supposed to? This is not clear to me.
I think that we need to change for statusbar for third case.

nsTextToSubURI::convertURItoUnicode
http://lxr.mozilla.org/mozilla/source/intl/uconv/src/nsTextToSubURI.cpp#171

if we can create nsStandardURL from aURI, we need to check as following:
1. if host name has '%' or it is not ASCII, we should return NS_ERROR_FAILURE.
2. if host name is not ASCII and it is UTF-8, we should return NS_ERROR_FAILURE.

These process should be inserted to between #191 and #193.
> 191   }
> 192 
> 193   // empty charset could indicate UTF-8, but aURI turns out not to be UTF-8.
> 194   NS_ENSURE_FALSE(aCharset.IsEmpty(), NS_ERROR_INVALID_ARG);

If we cannot create nsStandardURL, the text may be fragment of URI.
In this case, we shoud go to next step(#194).
> if we can create nsStandardURL from aURI, we need to check as following:
> 1. if host name has '%' or it is not ASCII, we should return NS_ERROR_FAILURE.
> 2. if host name is not ASCII and it is UTF-8, we should return
> NS_ERROR_FAILURE.

Ah, sorry. This comment is wrong.

if we can create nsStandardURL from aURI, we need to check only second.
I.e., if host name is not ASCII and it is UTF-8, we should return NS_ERROR_FAILURE.
Attachment #202583 - Flags: review? → review?(beltzner)
Attachment #200632 - Attachment is obsolete: true
Attachment #202585 - Flags: superreview?(darin)
Attachment #202585 - Flags: review?(masayuki)
(In reply to comment #35)
> > if we can create nsStandardURL from aURI, we need to check as following:
> > 1. if host name has '%' or it is not ASCII, we should return NS_ERROR_FAILURE.
> > 2. if host name is not ASCII and it is UTF-8, we should return
> > NS_ERROR_FAILURE.
> 
> Ah, sorry. This comment is wrong.
> 
> if we can create nsStandardURL from aURI, we need to check only second.
> I.e., if host name is not ASCII and it is UTF-8, we should return
> NS_ERROR_FAILURE.
> 

I'm still not sure I understand what you are trying to say here. Should the third test in your testcase work or not? My guess: it should not, the encoding is wrong. But we should not assume if it is not ASCII it must be UTF8. With the latest version of the patch it now throws a correct INVALID_HOST error, it does no longer choke silently while trying to convert something that is not UTF8 from  UTF8 to ACE.  
Comment on attachment 202585 [details] [diff] [review]
updated rest of the patch incorporating darins comments and Masayuki Nakanos work

Please add comment to nsStandardURL::NormalizeIDN:

"Don't use mOriginCharset for converting the unescaped host name to Unicode. Because if mOriginCharset is not correct, it may become security issue."

Of course, please use better English than mine :-)

With this changing r=masayuki
Attachment #202585 - Flags: review?(masayuki) → review+
(In reply to comment #38)
> (In reply to comment #35)
> > > if we can create nsStandardURL from aURI, we need to check as following:
> > > 1. if host name has '%' or it is not ASCII, we should return NS_ERROR_FAILURE.
> > > 2. if host name is not ASCII and it is UTF-8, we should return
> > > NS_ERROR_FAILURE.
> > 
> > Ah, sorry. This comment is wrong.
> > 
> > if we can create nsStandardURL from aURI, we need to check only second.
> > I.e., if host name is not ASCII and it is UTF-8, we should return
> > NS_ERROR_FAILURE.
> > 
> 
> I'm still not sure I understand what you are trying to say here. Should the
> third test in your testcase work or not? My guess: it should not, the encoding
> is wrong. But we should not assume if it is not ASCII it must be UTF8. With the
> latest version of the patch it now throws a correct INVALID_HOST error, it does
> no longer choke silently while trying to convert something that is not UTF8
> from  UTF8 to ACE.  

I think that we should fail third test.
Becuase that format is not correct for HTML/XHTML spec.

But status bar displays the link target with decoding by document charset.
# if you hover the mouse cursor, you will show "http://日本語.jp" too.
# the host name is decoded by shift_JIS that is charset of the document.
So, status bar displaied link target and actual link target are not same.
We need to fix this conflict.

We should allow to escape by UTF-8 only for host name.
But path, leafname and query parts are not so for compatibility.
Therefore, in nsTextToSubURI::convertURItoUnicode, we need to set the this new limit for host name.
This patch fixes conflict that is status bar vs actual URI.
Attachment #202608 - Flags: review?(jshin1987)
OK, so why is using the origin charset a security issue?
(In reply to comment #42)
> OK, so why is using the origin charset a security issue?
> 

If mOriginCharset is not same actual one and the unescaped host name can be converted by both mOriginCharset and actual one, the user will jump to the host that is decoded by mOriginCharset. If an major web site uses escaped host in a link that is not encoded by non document charset (Note that in this case, the link doesn't work fine on Mozilla for the page author. But on other UA(s), it may work fine.), some people can/may use this coflict. Of course, all IDN usable UAs are same implementation, this anxiety may become waste. But it is desirable.
Comment on attachment 202583 [details] [diff] [review]
separated and updated strings for the new error NS_ERROR_INVALID_HOST

I'm the wrong person to be reviewing this patch. Clearing request.
Attachment #202583 - Flags: review?(beltzner)
Comment on attachment 202583 [details] [diff] [review]
separated and updated strings for the new error NS_ERROR_INVALID_HOST

Erks, sorry; early-morning comprhension error. Here's my review:

>+invalidHost=The provided hostname %S is not in a recognized format.

We've been using "address" instead of "hostname" wherever possible in a first-pass attempt to simplify these messages. Also, we can narrow down what we mean by "recognized format" to let the user know what's not recognized. I suggest:

invalidHost=The address %S contains special characters

>+<!ENTITY invalidHost.title "Hostname invalid">
>+<!ENTITY invalidHost.longDesc "
>+<p>If you clicked on a link, most likely someone is trying to trick you. Otherwise, check the location bar for mistakes, and try again.</p>
>+">

Similarly here, I think we can be a bit stronger in terms of warning users about scams. I'm not even sure we need to tell users to check for errors (I can't think of many ways that a typo could result in this error - are there such possible cases?) I suggest:

invalidHost.title "The address isn't valid"
invalidHost.longDesc "
<ul>
  <li>It is possible that someone is trying to trick you into visiting a phony scam website</li>
  <li>Check the address for mistakes and try again</li>
</ul>"
Attachment #202583 - Flags: review-
>   <li>It is possible that someone is trying to trick you into visiting a phony
> scam website</li>

Caleb points out that:

<li>...trick you into visiting a scam website</li>

is much clearer, and he's right!
I've updated the strings accordingly. I left the ones for seamonkey a bit more technical than the firefox strings. 

A "mistyping" could happen if the wrong encoding is used. Not a scam, just an error.
Attachment #202583 - Attachment is obsolete: true
Attachment #202724 - Flags: review?(beltzner)
Comment on attachment 202724 [details] [diff] [review]
separated and updated strings for the new error NS_ERROR_INVALID_HOST v1.1

Andreas, sorry to nitpick (especially since I forgot these, too) but could you add "." to the end of both of the <li> items? We punctuated all the others, too.
Attachment #202724 - Flags: review?(beltzner) → review-
no problem, we all want this to be as good and as descriptive as possible..
Attachment #202724 - Attachment is obsolete: true
Attachment #202731 - Flags: review?(beltzner)
Comment on attachment 202731 [details] [diff] [review]
separated and updated strings for the new error NS_ERROR_INVALID_HOST v1.2

r=me, with thanks!
Attachment #202731 - Flags: review?(beltzner) → review+
Comment on attachment 202731 [details] [diff] [review]
separated and updated strings for the new error NS_ERROR_INVALID_HOST v1.2

The patches in this bug introduce a new net-error which has strings under mozilla/dom which are only used by seamonkey.
Attachment #202731 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #202731 - Flags: review?(neil.parkwaycc.co.uk)
Attachment #202731 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #202731 - Flags: superreview+
Attachment #202731 - Flags: review?(neil.parkwaycc.co.uk)
Attachment #202731 - Flags: review+
Comment on attachment 202608 [details] [diff] [review]
Patch for nsTextToSubURI rv1.0

Unfortunately, we can't call NS_NewURI here because at least one protocol handler (JSHanlder : see dom/src/jsurl/nsJSProtocolHandler) uses nsITextToSubURI. I learned this hard way when I tried to fix another bug in which I had to handle different parts of a URI differently. One possible hack would be to special-case 'javascript' (which doesn't have 'hostname' part), but it's not future-proof...
Attachment #202608 - Flags: review?(jshin1987) → review-
(In reply to comment #52)
> (From update of attachment 202608 [details] [diff] [review] [edit])
> Unfortunately, we can't call NS_NewURI here because at least one protocol
> handler (JSHanlder : see dom/src/jsurl/nsJSProtocolHandler) uses
> nsITextToSubURI. I learned this hard way when I tried to fix another bug in
> which I had to handle different parts of a URI differently. One possible hack
> would be to special-case 'javascript' (which doesn't have 'hostname' part), but
> it's not future-proof...

If we can (at this point) figure out if the URI is based on nsSimplURI or not we can use GetHost. SimpleURIs like javascript:.... have no host component
Blocks: 316730
(In reply to comment #52)
> (From update of attachment 202608 [details] [diff] [review] [edit])
> Unfortunately, we can't call NS_NewURI here because at least one protocol
> handler (JSHanlder : see dom/src/jsurl/nsJSProtocolHandler) uses
> nsITextToSubURI. I learned this hard way when I tried to fix another bug in
> which I had to handle different parts of a URI differently. One possible hack
> would be to special-case 'javascript' (which doesn't have 'hostname' part), but
> it's not future-proof...

After some more investigation I think this works just fine contrary to Jungshik Shin's view. In case of an javascript URL the GetHost fails and we continue as usual. I don't see a problem here.
(In reply to comment #54)
> (In reply to comment #52)
> > (From update of attachment 202608 [details] [diff] [review] [edit] [edit])
> > Unfortunately, we can't call NS_NewURI here because at least one protocol
> > handler (JSHanlder : see dom/src/jsurl/nsJSProtocolHandler) uses
> > nsITextToSubURI

> After some more investigation I think this works just fine contrary to Jungshik
> Shin's view. In case of an javascript URL the GetHost fails and we continue as
> usual. I don't see a problem here.

How did you test it? If you just used 'javascript' scheme without any non-ASCII characters, it wouldn't even get there. NS_NewURI() in the patch invokes nsJSProtocolHanlder for javascript scheme which in turn invokes UnescapeNonASCIIURI that calls   convertURItoUnicode that calls NS_NewURI, which completes a loop. 


See also bug 229548 and bug 304905
(In reply to comment #56)
> See also bug 229548 and bug 304905

Do you have a test javascript url which got you the inifinite loop? 
Attachment #202439 - Flags: review?(cbiesinger)
Attachment #202439 - Flags: review?(cbiesinger) → review+
Attached patch patch for 1.8 branch (obsolete) — Splinter Review
this patch also includes a change that let us check for a valid hostname with every kind of proxy (bug 318006).
jshin:

I have a better idea for the status bar problem.
I'll file a new bug for fixing it.
Comment on attachment 202585 [details] [diff] [review]
updated rest of the patch incorporating darins comments and Masayuki Nakanos work

Oops. Sorry. My patch is wrong. It's not secure.

In nsStandardURL::NormalizeIDN:

> +    if (!(IsASCII(resolvedHost) || IsUTF8(resolvedHost))) {
> +        resolvedHost = host;
> +    }

We should check the |resolvedHost| is valid or invalid by net_IsValidHostName.

See bug 246804.

If the |resolvedHost| is invalid name, we should use original(escaped) host for nsIURI::GetSpec().

Andreas Otte:

Could you update this patch?
Attachment #202585 - Flags: superreview?(darin)
Attachment #202585 - Flags: review-
Attachment #202585 - Flags: review+
(In reply to comment #59)
> jshin:
> 
> I have a better idea for the status bar problem.
> I'll file a new bug for fixing it.
> 

I'm working on the status bar issue. I'll post proposal patch in few days.
See bug 320807.
(In reply to comment #60)
> (From update of attachment 202585 [details] [diff] [review] [edit])
> Oops. Sorry. My patch is wrong. It's not secure.
> 
> In nsStandardURL::NormalizeIDN:
> 
> > +    if (!(IsASCII(resolvedHost) || IsUTF8(resolvedHost))) {
> > +        resolvedHost = host;
> > +    }
> 
> We should check the |resolvedHost| is valid or invalid by net_IsValidHostName.
> 
> See bug 246804.
> 
> If the |resolvedHost| is invalid name, we should use original(escaped) host for
> nsIURI::GetSpec().
> 
> Andreas Otte:
> 
> Could you update this patch?
> 

I don't think we should check with net_IsValidHostName at this point because there are a lot of internal urls that have something as authority component that is not a hostname. The mailbox-urls come to my mind. Doing the check with net_IsValidHostName could/would possibly break those. We only use net_IsValidHostName just before resolving a hostname.
I think that if there is following URI:

<a href="http://subdomain.com%2f.domain.com/">

a.href returns "http://subdomain.com/.domain.com/" by current patch.
If we use this URI string, some functions may create nsIURI object that having "subdomain.com" in its host name. So, we should warrant following function returns PR_TRUE.

PRBool foo(nsAFlatCString& aSpec)
{
  nsCOMPtr<nsIURI> originURI, newURI;
  NS_NewURI(getter_AddRefs(originURI), aSpec);
  nsCAutoString spec;
  originURI->GetSpec(spec);
  NS_NewURI(getter_AddRefs(newURI), spec);
  PRBool result;
  originURI->Equals(newURI, &result);
  return result;
}
> there are a lot of internal urls that have something as authority component
> that is not a hostname. The mailbox-urls come to my mind. Doing the check with
> net_IsValidHostName could/would possibly break those.

nsStandardURL::NormalizeIDN gets host name only. It's called from nsStandardURL::BuildNormalizedSpec and nsStandardURL::SetHost.
I don't think that mailbox-urls are not broken by here...
(In reply to comment #64)
> > there are a lot of internal urls that have something as authority component
> > that is not a hostname. The mailbox-urls come to my mind. Doing the check with
> > net_IsValidHostName could/would possibly break those.
> 
> nsStandardURL::NormalizeIDN gets host name only. It's called from
> nsStandardURL::BuildNormalizedSpec and nsStandardURL::SetHost.
> I don't think that mailbox-urls are not broken by here...
> 

Yes, they are. We did this twice when we tried to fix this problem the "easy" way and I was able to prevent it a third time. The "hostname" copmponent of the URL is not a real hostname in every case (it can simply be a registry-name, it does not have to follow the rules for hostnames), so checking for a "valid" hostname at this point is the sure way to break something.  
(In reply to comment #63)
> I think that if there is following URI:
> 
> <a href="http://subdomain.com%2f.domain.com/">
> 
> a.href returns "http://subdomain.com/.domain.com/" by current patch.
> If we use this URI string, some functions may create nsIURI object that having
> "subdomain.com" in its host name. So, we should warrant following function
> returns PR_TRUE.

I don't think this happens, at least not by the patch. If it happens, it is a bigger problem. Normally the %2F in http://subdomain.com%2f.domain.com/ is specifically used to hide the / from the URLParser. If not we would end up with a host subdomain.com and a path .domain.com/. But I think we would end up with subdomain.com%2f.domain.com as hostname, at least when using the normal URLParser, and that will be found out when we try to resolve the hostname and first check the hostname on its validity.

What happens when this url is shown in the status-bar is another problem, in my opinion we need an unescape function that unescapes selectivly depending on which component of the URL it is unescaping. But that is beyond the scope of this bug and I would not try to integrate it here.

The scope of this bug is to make %-encoded hostnames work in mozilla and not lessen security by doing so. If we show something in the statusbar that is not 100% perfect but we catch spoofing attempts, it is okay by me for now.
(In reply to comment #66)
> > I think that if there is following URI:
> > 
> > <a href="http://subdomain.com%2f.domain.com/">
> > 
> > a.href returns "http://subdomain.com/.domain.com/" by current patch.
> > If we use this URI string, some functions may create nsIURI object that having
> > "subdomain.com" in its host name. So, we should warrant following function
> > returns PR_TRUE.
> 
> I don't think this happens, at least not by the patch. If it happens, it is a
> bigger problem.

I confirmed this problem is occured on case 3 of testcase(attachment 206800 [details]).
# But network access is failed.(This is good.)
If the host name contains non-ASCII character, the %2F is unescaped too.
This is serious bug.

And I think:
> +    // unescape the host
> +    NS_UnescapeURL(host, esc_AlwaysCopy, resolvedHost);

This code should be:
> +    // unescape the host
> +    NS_UnescapeURL(host, esc_SkipControl | esc_AlwaysCopy, resolvedHost);
                            ^^^^^^^^^^^^^^^^^^
Sorry.
I see, but we can't loose the esc_AlwaysCopy or we would have a potential top crasher on our hand. And I would take it a little bit further and use esc_OnlyNonASCII as addition unescape-mask. This is the mildest unescape currently possible. 
Oops, just saw that you also had the esc_AlwaysCopy in your version. sorry.
Attachment #202585 - Attachment is obsolete: true
Attachment #206804 - Flags: review?(masayuki)
Attachment #204394 - Attachment is obsolete: true
Comment on attachment 206804 [details] [diff] [review]
new version of the patch, changed unescaping

I don't like this patch. Because we have minor bug (We cannot use %HH for alphabet and numeric, this is not valid for RFC 3986).

But I don't have better approach now. And the issue is very minor. If I'll find a good idea, I'll fix it after this bug.

> esc_AlwaysCopy+esc_OnlyNonASCII

Please insert spaces around plus sign. If this is fixed, r=me.
Attachment #206804 - Flags: review?(masayuki) → review+
done. I agree, this is a tradeoff between complete support of %-escaped hostnames for RFC3986 and preventing spoofing attempts.
Attachment #206804 - Flags: superreview?(darin)
See bug 320568 for a small IDN-testing environment that may be useful for testing this patch.
*** Bug 191388 has been marked as a duplicate of this bug. ***
Status: NEW → ASSIGNED
Priority: -- → P2
Not going to block 1.8.1 for this bug.
Flags: blocking1.8.1? → blocking1.8.1-
*** Bug 350556 has been marked as a duplicate of this bug. ***
*** Bug 361816 has been marked as a duplicate of this bug. ***
-> reassign to default owner
Assignee: darin.moz → nobody
Status: ASSIGNED → NEW
QA Contact: benc → networking
Blocks: 387921
As IDN hostnames get more popular we're picking up more and more dupes of this. Affects plugins, too -- for IDN hostnames some of them (Flash for sure) feed us %-encoded utf8 hostnames. Can break streaming, see bug 387921
Flags: blocking1.9?
Going to call this blocking based on a patch that looks ready. If this is not the case, can you renominate for triage?
Flags: blocking1.9? → blocking1.9+
Attachment #206804 - Flags: superreview?(darin.moz) → superreview?(cbiesinger)
Target Milestone: mozilla1.9alpha1 → mozilla1.9 M8
so I should have asked that earlier, but why not do the unescaping only in nsStandardURL and let the socket transport and dns service only deal with unescaped host names? it seems like a waste to do unescaping in all three places.
Comment on attachment 206804 [details] [diff] [review]
new version of the patch, changed unescaping

marking sr- until that question is answered
Attachment #206804 - Flags: superreview?(cbiesinger) → superreview-
Whiteboard: [has-patch]
Comment on attachment 206804 [details] [diff] [review]
new version of the patch, changed unescaping

Andreas, are you still around?  If so, can you update this patch to trunk and address biesi's comment?
No owner yet, moving out to M11.
Priority: P2 → P3
Target Milestone: mozilla1.9 M8 → mozilla1.9 M11
Target Milestone: mozilla1.9beta3 → mozilla1.9beta4
Moving bugs that aren't beta 4 blockers to target final release.
Target Milestone: mozilla1.9beta4 → mozilla1.9
Flags: tracking1.9+ → wanted-next+
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9pre) Gecko/2008041001 SeaMonkey/2.0a1pre

When hovering the mouse over the 3 links in attachment 202340 [details], they all appear the same in the status bar (like the first one, with three kanji; and I suppose it would be DNS-resolved using IDN). Does this mean this bug "works for me"?
you have to try loading them, not just look at the statusbar.
Ah, I see. The first link (kanji) opens the page, the other two (percent-encoded UTF-8 and percent-encoded shift-jis) give a XUL error page.
Blocks: 380555
Maybe it would be useful to know that SharePoint generates references to documents in this form:

http://thesharepoint%2Edomain%2Ecom/some/path/...

They systematically fail. Worse, if you do this:
http://microsoft%2Ecom

FireFox replaces that URL with:
http://www.microsoft.com.com

Seems to me if it was going to do that, it should do it before evaluating the DNS, not after having failed with it.
I have never seen that on the SharePoint sites that I use - ask your admins if they know why it happens like that.
I admit that when it comes to SharePoint it is probably dues to the general incompetence of M$. However, this still remains true:

If you do this:
http://microsoft%2Ecom

FireFox replaces that URL with:
http://www.microsoft.com.com

Should it not either:
1) Not change anything, not replace the %2E at all.
2) Replace the %2E BEFORE processing the DNS.

It seems to me that what it does now is contradictory. If a RFC states it should not change the %2E, as apparently someone seems to think about this one http://www.ietf.org/rfc/rfc1738.txt then it should not be converted. If FireFox is expected to convert it, one would expect it would do so in such a way that it would corrupt what was in the DNS int the first place.

Why should there be a third option that would read:

3) Convert DNS into something it never was.

Thanks.
Depends on: 750587
There are some curious oddities where Firefox alters the URL, *and it even looks good*, but returns Server Not Found.

Try this on for size:
https://twitter%2Ecom/ConanOBrien
Bug 792156 was marked as a duplicate of this bug. But the situation is entirely different. 792156 refers to opening a site from the Mac via LSOpenFromURLSpec and passing a UTF8 encoded hostname. That does not work. Neither does passing a percent-encoded hostname. Passing a UTF8 encoded hostname works with Safari and Chrome. We will be telling our customers to use one of those browsers for international sites until this bug is fixed.
I'm happy enough to work under this bug number. But there are two different situations. I'll post sample code later.

Situation 1:
char* site = "http://www.vihtilä.fi/"
CFURLCreateWithBytes(kCFAllocatorDefault, site, strlen(site), kCFStringEncodingUTF8)
gives: Can’t connect to http://www.vihtil\xc3\xa4.fi [note \xc3\xa4]

Situation 2:
char* site = "http://www.vihtil%C3%A4.fi/"
CFURLCreateWithBytes(kCFAllocatorDefault, site, strlen(site), kCFStringEncodingUTF8)
gives: Can’t connect to http://www.vihtil%C3%A4.fi [note %C3%A4]

In Situation 1 a non-percent-encoded string was evidently in the Apple event. But in any case, FF didn't open the site.
If you have steps to reproduce that give your situation 1, please post those in bug 792156 and reopen that bug....
FWIW, both Safari and Chrome can open these hostnames. Firefox and IE10 cannot.
Blocks: url
While this bug has a lot of activity, the patches were mostly irrelevant at this point.
This issue was fixed in bug 412457.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: