Last Comment Bug 102656 - IDN support for HREF IDN-links
: IDN support for HREF IDN-links
Status: RESOLVED FIXED
: intl
Product: Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: All All
: P2 normal (vote)
: mozilla0.9.9
Assigned To: nhottanscp
: Wil Tan
: Makoto Kato [:m_kato]
Mentors:
http://pg.i-dns.net/~wil/moztest/chib...
Depends on:
Blocks: 102701
  Show dependency treegraph
 
Reported: 2001-10-01 20:36 PDT by Wil Tan
Modified: 2002-04-11 20:22 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
This patch makes a "mixed" URL where the hostname is in UTF8. It is just a proof-of-concept, please suggest optimizations. (1.59 KB, patch)
2001-10-23 10:46 PDT, Wil Tan
no flags Details | Diff | Splinter Review
Moved it inside else block of IsAscii check. More cleanup. (1.71 KB, patch)
2001-10-26 07:27 PDT, Wil Tan
no flags Details | Diff | Splinter Review
FIxed aResult leak, thanks jst. (1.79 KB, patch)
2001-10-30 21:41 PST, Wil Tan
no flags Details | Diff | Splinter Review
fixed using nsIIOService::ExtractUrlPart as suggested by Darin. Made into a separate function. (2.29 KB, patch)
2001-11-05 06:43 PST, Wil Tan
no flags Details | Diff | Splinter Review
fixed nsCOMPtr going out of scope, and checking for nsresult return for NS_NewURI instead. (2.29 KB, patch)
2001-11-05 15:33 PST, Wil Tan
no flags Details | Diff | Splinter Review
Here's a version of ConvertHostnameToUTF8 that takes an nsXPIDLCString. (2.17 KB, patch)
2001-11-06 16:38 PST, Wil Tan
darin.moz: superreview+
Details | Diff | Splinter Review
targetSpec => char ** (2.19 KB, patch)
2001-11-07 04:09 PST, Wil Tan
jst: review+
darin.moz: superreview+
Details | Diff | Splinter Review
Incorporated many of the comments give yesterday. (3.05 KB, patch)
2001-11-09 15:49 PST, Wil Tan
no flags Details | Diff | Splinter Review
removed the call to ExtractScheme(), thanks Darin. (2.68 KB, patch)
2001-11-10 07:27 PST, Wil Tan
darin.moz: review+
Details | Diff | Splinter Review
cleanup as suggested by Johnny. (2.76 KB, patch)
2001-11-16 02:14 PST, Wil Tan
no flags Details | Diff | Splinter Review
removed fake error code return when hostname is ascii. (2.80 KB, patch)
2001-11-16 02:57 PST, Wil Tan
no flags Details | Diff | Splinter Review
*targetSpec = spec // when hostname IsAscii, caller no longer uses nsXPIDLCString (3.30 KB, patch)
2001-11-16 18:52 PST, Wil Tan
no flags Details | Diff | Splinter Review
*targetSpec = nsnull when hostname is ascii, as suggested by Darin. (3.01 KB, patch)
2001-11-20 07:19 PST, Wil Tan
jst: review+
darin.moz: superreview+
Details | Diff | Splinter Review
2 changes suggested by Darin and Johnny, i.e. use newSpec.IsEmpty() and moved *targetSpec = nsnull to top. (3.01 KB, patch)
2001-11-20 21:08 PST, Wil Tan
no flags Details | Diff | Splinter Review

Description Wil Tan 2001-10-01 20:36:20 PDT
ToLowerCase
(http://lxr.mozilla.org/mozilla/source/netwerk/base/src/nsURLHelper.cpp#176)
is being called to downcase the hostnames for the internal representation 
of "host" in nsIURI.
This creates a problem for some internationalized domain names in their 
native encoding (like Shift-JIS, Big5).
Is there a reason why it is done this way?
This is also true in nsStdURLParser and nsNoAuthURLParser.
Comment 1 nhottanscp 2001-10-23 10:22:37 PDT
Another approach is to prevent the native encoding (like Shift-JIS, Big5) host
name and use UTF-8 instead. I think keeping the native encoding in nsIURI is not
a good idea, parsing is difficult, also the host eventualy needs to be unicode
in order to create ACE (ASCII Compatibe Encodeing which is needed for IDN). So,
we want to convert to unicode earlier instead of changing the parser to handle
the native encodings.
This can be done by changing NS_MakeAbsoluteURIWithCharset()in nsHTMLUtils.cpp
where URI for HREF is processed. In the function, the URI is in UCS2, then it is
converted to a doucment charset. Instead, we can keep the host name as unicode
(UTF-8). 
William, could you attach your patch for nsHTMLUtils.cpp?
Comment 2 Wil Tan 2001-10-23 10:46:13 PDT
Created attachment 54705 [details] [diff] [review]
This patch makes a "mixed" URL where the hostname is in UTF8. It is just a proof-of-concept, please suggest optimizations.
Comment 3 nhottanscp 2001-10-23 14:19:32 PDT
In the function, there is ASCII check for URI spec. Also, some cases already use
UTF-8 URI. So, we only need to set UTF-8 host name when the URI is converted to
a doucment charset.

http://lxr.mozilla.org/seamonkey/source/content/shared/src/nsHTMLUtils.cpp#149

149 if (encoder) {
150 // Got the encoder: let's party.

Comment 4 Wil Tan 2001-10-25 18:40:43 PDT
I will update the patch within a day or two.
I agree with Naoki that nsIURI should be given a UTF8 hostname in the first place.
Is there any other entry-point for creating a URL other than 
NS_MAkeAbsoluteURIWithCharset?


* CC'ing Darin.
Comment 5 Darin Fisher 2001-10-25 18:44:51 PDT
so, let me make sure i understand things correctly... if the standard url
implementation simply avoids lowercasing the hostname, then we should be OK, right?
Comment 6 Darin Fisher 2001-10-25 18:47:32 PDT
oh.. wait.. there is one more problem.  namely that of IPv6 address literals
which can contain ':' chars.  we currently strchr for ':' and then wrap the 
hostname in []'s.  this is defined in RFC 2732.  won't this be a problem if
hostname is UTF-8?  or, perhaps i just need to use some UTF-8 aware strchr??
or, use nsCRT::IsAscii first??
Comment 7 Wil Tan 2001-10-26 07:27:42 PDT
Created attachment 55232 [details] [diff] [review]
Moved it inside else block of IsAscii check. More cleanup.
Comment 8 Wil Tan 2001-10-26 07:37:36 PDT
Darin: I'm not sure I understand your concerns.
If we avoid lowercasing the hostname, it would work but it's not an ideal solution 
since the hostname would be in local encoding. Many interfaces are still using C 
strings which means that the hostnames are passed in its raw local charset.

IPv6 hostnames are be in ASCII, so encoding it into UTF8 should have no effect.
There is also no problem with ':' clashing since if ':' appears in UTF8 then it
is the bona fide ':'.
Comment 9 Darin Fisher 2001-10-26 12:01:06 PDT
ok, thanks for explaining why we don't need to worry about embedded colons.

but, what i still don't understand is why removing the code which lowercase's
the hostname is not sufficient.  doesn't a UTF-8 encoded string carry all the
information needed to perform the IDN conversion?  and isn't that all we really
need at the networking layer?
Comment 10 Wil Tan 2001-10-26 14:00:32 PDT
If the hostname in URI was UTF8 encoded, ToLowerCase wouldn't have any effect 
on it, and we would be fine. But it is not, it is converted to the document
charset in NS_MakeAbsoluteURI. And since it is represented as a C string 
it is just as good sequence of raw bytes.
It works with i-DNS.net's software because we use guesswork to "detect" the 
encoding, but would not be elegant. But that is really just legacy support for 
non unicode-aware applications.
Comment 11 Darin Fisher 2001-10-26 14:14:49 PDT
ok, so it sounds to me as though we can/should leave necko alone, and
concentrate on making sure that all clients of nsIURI pass in UTF-8 hostnames. 
right?
Comment 12 Wil Tan 2001-10-26 14:24:02 PDT
Are there many such places that instantiates nsIURI's?
I wouldn't exclude the possibility of having nsStdURL::SetSpec doing IDN 
conversion so that other places (like the current HTTP and DNS modules) do not
need to worry about it. So, nsIURI::GetHost gives the ACE hostname, nsIURI::GetSpec
gives the URL string with UTF8 hostname, and a separate interface for getting
the UTF8 hostname for display (such as the status bar).
Is that reasonable, or is that silly?
Comment 13 Darin Fisher 2001-10-26 15:36:27 PDT
well, i'm not sure if GetSpec should return a UTF-8 hostname or an ACE encoded
hostname.  i say this because when speaking to a HTTP proxy, we need to send the
spec with an ACE encoded hostname.  perhaps nsIURI is not the right stopgap for
IDN conversion then.  i'm really not sure.
Comment 14 Wil Tan 2001-10-26 16:07:53 PDT
Oh yes, forgot about proxies. So let's forget about doing IDN in nsIURI, at least
for now.
Comment 15 Darin Fisher 2001-10-26 16:13:48 PDT
ok
Comment 16 Wil Tan 2001-10-30 21:41:53 PST
Created attachment 55882 [details] [diff] [review]
FIxed aResult leak, thanks jst.
Comment 17 Darin Fisher 2001-11-04 22:30:06 PST
you can also use nsIIOService::extractUrlPart to parse out the hostname of the
UTF8 spec.  this would eliminate the need to create a second uri object.
Comment 18 Wil Tan 2001-11-05 06:43:00 PST
Created attachment 56549 [details] [diff] [review]
fixed using nsIIOService::ExtractUrlPart as suggested by Darin. Made into a separate function.
Comment 19 Darin Fisher 2001-11-05 12:55:22 PST
Comment on attachment 56549 [details] [diff] [review]
fixed using nsIIOService::ExtractUrlPart as suggested by Darin. Made into a separate function.

>Index: content/shared/src/nsHTMLUtils.cpp


> 
>+/*
>+ * Extracts the hostname part of the given targetSpec,
>+ * encodes it to UTF8, and replace it in-place.
>+ */
>+nsresult
>+ConvertHostnameToUTF8(nsCAutoString& targetSpec,
>+                      const nsString& uSpec,
>+                      nsIIOService *aIOService)
>+{
>+  nsresult rv;
>+
>+  if (!aIOService) {
>+    nsCOMPtr<nsIIOService> serv;
>+    serv = do_GetIOService(&rv);
>+    if (NS_FAILED(rv)) return rv;
>+    aIOService = serv.get();
>+  }

this block of code won't work... the destructor for |serv| would have already
been called by the time you exit this block of code.  you need to give the 
nsCOMPtr declaration function scope.


>+  // Try creating a URI from the original targetSpec
>+  nsCOMPtr<nsIURI> uri;
>+  NS_NewURI(getter_AddRefs(uri), targetSpec);
>+  // This will fail for non-absolutely URIs, in that case we just ignore it.
>+  if (!uri) return NS_OK;

capturing rv is better (in the XPCOM sense) than checking for a non-null uri.


>+
>+  // convert the original spec into UTF8
>+  nsCAutoString specUTF8 = NS_ConvertUCS2toUTF8(uSpec);
>+  // Extract the UTF8 hostname
>+  nsXPIDLCString hostUTF8;
>+  rv = aIOService->ExtractUrlPart(specUTF8.get(), nsIIOService::url_Host, NULL, NULL, getter_Copies(hostUTF8));
>+  NS_ASSERTION(NS_SUCCEEDED(rv), "Unable to extract Hostname part from URL");
>+  if (NS_FAILED(rv)) return rv;
>+  // replace the original hostname with hostUTF8
>+  uri->SetHost(hostUTF8.get());
>+
>+  nsXPIDLCString tmp;
>+  uri->GetSpec(getter_Copies(tmp));
>+  // replace the targetSpec
>+  targetSpec = tmp;
>+  return NS_OK;
>+}
> 
> nsresult
> NS_MakeAbsoluteURIWithCharset(char* *aResult,
>@@ -165,6 +204,9 @@
>           encoder->Finish(p, &len);
>           p[len] = 0;
>           spec += p;


what if |aSpec| contains only ascii characters?  adding an IsAscii check
would probably be an optimization here, since NS_NewURI is expensive.

>+
>+          // iDNS support: encode the hostname in the URL to UTF8
>+          ConvertHostnameToUTF8(spec, aSpec, aIOService);
>           
>           if (p != buf)
>             delete[] p;
>@@ -174,8 +216,8 @@
>           // by default.
>           spec = NS_ConvertUCS2toUTF8(aSpec.get());
>         }
>-
>       }
>+
>       // Now we need to URL-escape the string. 
>       // XXX andreas.otte has warned that using the nsIIOService::Escape
>       // method in this way may be too conservative (e.g., it won't
Comment 20 Wil Tan 2001-11-05 15:33:36 PST
Created attachment 56618 [details] [diff] [review]
fixed nsCOMPtr going out of scope, and checking for nsresult return for NS_NewURI instead.
Comment 21 Wil Tan 2001-11-05 15:36:58 PST
Thanks Darin!

> what if |aSpec| contains only ascii characters?  adding an IsAscii check
> would probably be an optimization here, since NS_NewURI is expensive.

There is an IsAscii check at the beginning of NS_MakeAbsoluteURIWithCharset()
so this function will only be called when (!IsAscii(aSpec.get()).
Comment 22 Darin Fisher 2001-11-05 15:41:33 PST
does |spec| need to be a nsCAutoString?  why not just declare it |char *spec| to 
avoid an extra strcpy/strdup in ConvertHostnameToUTF8?
Comment 23 Wil Tan 2001-11-05 15:58:19 PST
It's already an nsCAutoString in NS_MakeAbsoluteURIWithCharset, so just pass it
to ConvertHostnameToUTF8 so that it can be updated.
If ConvertHostnameToUTF8 uses |char *| wouldn't it be messy as well? I don't
get it, please elaborate.
Comment 24 Darin Fisher 2001-11-05 17:57:03 PST
well, when you call GetSpec, you end up malloc'ing a copy of the url spec, but
then you copy this string into |nsCAutoString &targetSpec| passed in by the
user.  does the caller need this to be a nsCAutoString, or could they use
nsXPIDLCString?  if the caller could be modified to use nsXPIDLCString then you
could avoid making the additional string copy.  make sense?
Comment 25 Wil Tan 2001-11-05 19:10:32 PST
Ok, now I see where the extra strdup is. But still don't think that the caller 
can be (easily) modified to use nsXPIDLCString because it uses many methods in
nsCString (eg. AssignWithConversion, FindChar) and uses NS_ConvertUCS2toUTF8,
which is derived from nsCAutoString.

Naoki, any idea?

I'm still trying to understand the entire hierarchy of string classes, wished 
the inheritance tree was simpler.
Comment 26 nhottanscp 2001-11-06 11:12:06 PST
I am not sure, changing the caller seems to involve a lot of changes. 
How about using nsXPIDLCString as an argument and remove the string copy out of
ConvertHostnameToUTF8? Later, the caller may change to use nsXPIDLCString.

BTW, is ConvertHostnameToUTF8 a local function?
Comment 27 Darin Fisher 2001-11-06 11:35:27 PST
yeah, i agree with nhotta: if you can't easily change the caller, then at least
make the caller responsible for the additional strdup.
Comment 28 Wil Tan 2001-11-06 16:28:35 PST
Yes it's a local function, at least for now.
Comment 29 Wil Tan 2001-11-06 16:38:04 PST
Created attachment 56810 [details] [diff] [review]
Here's a version of ConvertHostnameToUTF8 that takes an nsXPIDLCString.
Comment 30 Darin Fisher 2001-11-07 01:20:41 PST
Comment on attachment 56810 [details] [diff] [review]
Here's a version of ConvertHostnameToUTF8 that takes an nsXPIDLCString.

>Index: content/shared/src/nsHTMLUtils.cpp

>+nsresult
>+ConvertHostnameToUTF8(nsXPIDLCString& targetSpec,

this is fine, but you could also pass |char **targetSpec|, and then use
getter_Copies at the callsite of ConvertHostnameToUTF8.

either way, sr=darin
Comment 31 Wil Tan 2001-11-07 01:37:27 PST
Yep I think that's a good idea, I'll upload another soon.
hang on, if sr=darin, then r= who?
Johnny did say that he would sr= if I get an r=, please advise.
Comment 32 Wil Tan 2001-11-07 04:09:31 PST
Created attachment 56867 [details] [diff] [review]
targetSpec => char **
Comment 33 nhottanscp 2001-11-07 10:08:57 PST
I think having two sr= is just fine, no r= needed for that case.
William, could you reassign the bug to yourself since you are actually working on?
I can help check in the file.
Comment 34 Wil Tan 2001-11-07 10:54:36 PST
Reassigning to myself.
Comment 35 Darin Fisher 2001-11-07 13:34:41 PST
Comment on attachment 56867 [details] [diff] [review]
targetSpec => char **

sr=darin (you can use this as an r= if you prefer)
Comment 36 Wil Tan 2001-11-07 17:39:56 PST
Is 0.9.6 a realistic target for this? 
Comment 37 nhottanscp 2001-11-08 10:24:07 PST
If you have reviews and ready for check in, please get an approval and check in,
otherwise move to 0.9.7.
Comment 38 Johnny Stenback (:jst, jst@mozilla.com) 2001-11-08 13:43:44 PST
Comment on attachment 56867 [details] [diff] [review]
targetSpec => char **

r/sr=jst
Comment 39 nhottanscp 2001-11-08 15:08:33 PST
move to 0.9.7
Comment 40 David Baron :dbaron: ⌚️UTC-10 2001-11-08 15:43:47 PST
Have you run performance tests on this patch?  IIRC,
NS_MakeAbsoluteURIWithCharset is on one of the most performance-critical paths
in page loading since it is used to canonicalize URIs for lookup in global
history to determine whether links should be colored as visited or unvisited.
Comment 41 David Baron :dbaron: ⌚️UTC-10 2001-11-08 15:45:58 PST
If that's the case, the string usage here looks unacceptably slow.  You could
improve it a bit by using the NS_ConvertUCS2toUTF8 inline rather than copying a
second time by assigning the result into an nsCAutoString.
Comment 42 Darin Fisher 2001-11-08 15:57:01 PST
william: also, what about checking if the hostname is ASCII?  if the hostname
were actually ASCII (padded with zero bytes) wouldn't we be able to skip this
entire function call?
Comment 43 Chris Waterson 2001-11-08 16:01:52 PST
What dbaron says. You ought to understand the performance impact of this,
especially on link-riddled pages (go try some LXR docs, e.g.).
Comment 44 nhottanscp 2001-11-08 16:16:01 PST
The code is only executed when the URL contains non ASCII.
Comment 45 David Baron :dbaron: ⌚️UTC-10 2001-11-08 16:25:50 PST
OK, but it would still be good not to make it too slow (e.g., by fixing the
extra copy I mentioned above).
Comment 46 nhottanscp 2001-11-08 16:29:14 PST
I agree. Willam, please make the change, thanks.
Comment 47 Darin Fisher 2001-11-08 17:02:03 PST
my point was that while the URL may contain non-ascii characters, the hostname
may only contain ascii characters.  it might be a performance win to just scan
the hostname before going through the trouble of instantiating a new nsStdURL.

so, it might be worth it to move the call to ExtractUrlPart above the call to
NS_NewURI.  you could even capture just the hostname offset and length (ie.
don't capture the hostname as a nsXPIDLCString), then if this section of the
UTF-8 spec contains non-ascii characters, you could continue on by calling
NS_NewURI, etc.

but, moreover it seems really unfortunate to have to make an extra copy of the
spec just to support iDNS.  isn't there a better way?  seems like we really need
a way of parsing a unicode URL.  then we could locate the unicode hostname
and check if it is ascii.  if it is ascii, then there is no need to do any more
work.  unfortunately, we don't have a unicode URL parser.

i feel like i should have looked more closely at the patch in context, because
it really is adding substantial work to NS_MakeAbsoluteWithCharset just to
support an edge case.  let's try to come up with something that isn't so costly
for non-IDN links.
Comment 48 Darin Fisher 2001-11-08 17:04:13 PST
what's the format of |spec| when ConvertHostnameToUTF8 is called?  can we use it
to determine if the hostname is really just ascii?
Comment 49 Johnny Stenback (:jst, jst@mozilla.com) 2001-11-08 17:21:18 PST
This will only be triggerd if the href contains non-ASCII data tho, so the
normal case will not be any different from what's on the trunk today. With that
said, if there's a faster way to do this I'm all for doing that...
Comment 50 Johnny Stenback (:jst, jst@mozilla.com) 2001-11-08 17:23:25 PST
While we're at it, we should remove the IsAscii() method from nsHTMLUtils.cpp
and simply call nsCRT::IsAscii().
Comment 51 Darin Fisher 2001-11-09 12:35:45 PST
aren't non-ascii url paths more common than non-ascii hostnames?
Comment 52 Wil Tan 2001-11-09 14:21:39 PST
Thanks to all who commented. A patch is underway.

> You could improve it a bit by using the NS_ConvertUCS2toUTF8 inline rather 
> than copying a second time by assigning the result into an nsCAutoString.

Oh I missed that. Will do.

> my point was that while the URL may contain non-ascii characters, the 
> hostname may only contain ascii characters.  it might be a performance win 
> to just scan the hostname before going through the trouble of instantiating
> a new nsStdURL.

Agreed. I'll address this one too.

> so, it might be worth it to move the call to ExtractUrlPart above the call 
> to NS_NewURI.  you could even capture just the hostname offset and 
> length (ie. don't capture the hostname as a nsXPIDLCString), then if this
> section of the UTF-8 spec contains non-ascii characters, you could continue
> on by calling NS_NewURI, etc.

Ok.

> but, moreover it seems really unfortunate to have to make an extra copy of 
> the spec just to support iDNS.  isn't there a better way?  seems like we 
> really need a way of parsing a unicode URL.  then we could locate the 
> unicode hostname and check if it is ascii.  if it is ascii, then there is 
> no need to do any more work.  unfortunately, we don't have a unicode URL
> parser.

> i feel like i should have looked more closely at the patch in context, 
> because it really is adding substantial work to NS_MakeAbsoluteWithCharset 
> just to support an edge case.  let's try to come up with something that 
> isn't so costly for non-IDN links.

Yeah there should be a better way of doing it. I'm especially concerned about
the other dozens of calls to NS_NewURI. When nsIURI uses unicode internally
then there'd be less troubles. But I think there would be a lot to change.

> what's the format of |spec| when ConvertHostnameToUTF8 is called?  can we use
it to determine if the hostname is really just ascii?

It is basically whatever in the HREF attribute. So it could be relative or
absolute. Are you suggesting an alternative to using ExtractUrlPart?

> While we're at it, we should remove the IsAscii() method from nsHTMLUtils.cpp
and simply call nsCRT::IsAscii()

Agreed. Shall I do it in the same patch?
Comment 53 Wil Tan 2001-11-09 15:49:52 PST
Created attachment 57310 [details] [diff] [review]
Incorporated many of the comments give yesterday.

* construct NS_ConvertUCS2toUTF8 directly.
* additional check for ascii hostname (even though url may be non-ascii) before

  calling NS_NewURI
* removed IsAscii() in favor of nsCRT::IsAscii()
  (the context diff is screwed up, so it looks like ConvertHostnameToUTF8
replaced IsAscii. sorry)

We could have used ExtractUrlPart on the |char *spec| to save the UTF8 
conversion if the hostname is ascii. But i figured ExtractUrlPart is fairly
expensive as well, (for parsing anything beyond the url_Scheme).
Please advise.
Comment 54 Darin Fisher 2001-11-09 17:46:49 PST
Comment on attachment 57310 [details] [diff] [review]
Incorporated many of the comments give yesterday.

don't you still need to UTF8'ize the hostname if it is absolute?
Comment 55 Darin Fisher 2001-11-09 17:47:22 PST
s/it/the spec/
Comment 56 Wil Tan 2001-11-09 18:37:32 PST
> don't you still need to UTF8'ize the hostname if it is absolute?
The rationale is that there may be lots of non-absolute links, so we can save
on the constructing an NS_ConvertUCS2toUTF8. Does it make sense, or is it 
unnecessary?
Comment 57 Darin Fisher 2001-11-09 19:28:41 PST
you can have relative URLs of the form

//hostname/foo/bar.html
Comment 58 Wil Tan 2001-11-09 21:54:29 PST
Sorry, I wasn't aware of that. In that case it that step would be pointless, I
can snip it.
Comment 59 Wil Tan 2001-11-09 21:58:59 PST
Darin, could you please clarify which "it" do you mean in "s/it/the spec/"?
Thanks.
Comment 60 Wil Tan 2001-11-10 07:27:51 PST
Created attachment 57363 [details] [diff] [review]
removed the call to ExtractScheme(), thanks Darin.

Nevermind, I get it now. You're referring to your previous comment :)
Comment 61 Darin Fisher 2001-11-12 19:09:07 PST
Comment on attachment 57363 [details] [diff] [review]
removed the call to ExtractScheme(), thanks Darin.

r/sr=darin
Comment 62 Wil Tan 2001-11-12 21:50:33 PST
Thanks everyone.
Darin: could you please check it in for me? Should I email Asa for approval?
Comment 63 Darin Fisher 2001-11-13 10:54:23 PST
william: you still need another r= or sr= on your latest patch.  then if you
want this for 0.9.6, you'll need to send mail to drivers@mozilla.org for
approval... though i doubt they'll take it.
Comment 64 Wil Tan 2001-11-13 11:01:48 PST
0.9.6 would be too tight, i can wait.
Johnny, would you mind taking a look at the latest patch for me? TIA.
Comment 65 Johnny Stenback (:jst, jst@mozilla.com) 2001-11-16 01:29:49 PST
- In ConvertHostnameToUTF8(), |if| and its statement on different lines, i.e.:

+  if (nsCRT::IsAscii(hostUTF8.get())) return NS_ERROR_ABORT;

on one line is a no no, fix all those.

Also spread the code out a bit to make it more readable.

- In NS_MakeAbsoluteURIWithCharset():

+          if (NS_SUCCEEDED(rv))
+              spec = newSpec.get();

Loose the .get(), all that does is slow things down.

With that, r/sr=jst
Comment 66 Wil Tan 2001-11-16 02:14:06 PST
Created attachment 58090 [details] [diff] [review]
cleanup as suggested by Johnny.

Thanks Johnny.
Does this look better now?
Comment 67 Johnny Stenback (:jst, jst@mozilla.com) 2001-11-16 02:29:18 PST
Yeah, looks way better, but I saw one more thing when looking over it again...

Returning an error from ConvertHostnameToUTF8() just because the host name is
ASCII is just wrong, if it's ASCII, return the host name as is and return NS_OK.
Make the code that calls this method return the error code to the caller in case
ConvertHostnameToUTF8() really fails. Don't attempt to encode non-errors in
error codes...

sr=jst with that fixed, sorry for not catching this the first time.
Comment 68 Wil Tan 2001-11-16 02:57:52 PST
Created attachment 58093 [details] [diff] [review]
removed fake error code return when hostname is ascii.

If hostname is ascii, 2 extra strdup are needed though.
Moved the "delete[] p" portion up so that it doesn't leak in case 
ConvertHostnameToUTF8 fails.
Comment 69 Johnny Stenback (:jst, jst@mozilla.com) 2001-11-16 16:22:17 PST
If you think the double strdup is a problem then make the code not do that, but
don't piggyback on the error code :-)
Comment 70 Wil Tan 2001-11-16 18:52:51 PST
Created attachment 58187 [details] [diff] [review]
*targetSpec = spec // when hostname IsAscii, caller no longer uses nsXPIDLCString

I agree that piggybacking on the error code is wrong.
This alternative piggybacks on the (spec==targetSpec) property,
please suggest any alternative idea you have.

But patch 58093 is still the cleanest way IMHO, can we live with the extra
strdups, Darin?
Comment 71 Darin Fisher 2001-11-19 11:19:34 PST
this looks pretty yuck to me... i mean, now targetSpec is sometimes freshly
malloc'd, and other times it's just a weak reference to an existing string.

how about returning NS_OK and *targetSpec = nsnull?
Comment 72 Wil Tan 2001-11-20 07:19:52 PST
Created attachment 58528 [details] [diff] [review]
*targetSpec = nsnull when hostname is ascii, as suggested by Darin.

Agreed. Why didn't I think of that...
Comment 73 Darin Fisher 2001-11-20 12:36:46 PST
Comment on attachment 58528 [details] [diff] [review]
*targetSpec = nsnull when hostname is ascii, as suggested by Darin.

r/sr=darin, with one change:

if (newSpec.get())

should be

if (!newSpec.IsEmpty())
Comment 74 Wil Tan 2001-11-20 16:43:27 PST
Do I submit another patch for the change, and then bug you for it again?
I'm just confused about how things should go...
Comment 75 Darin Fisher 2001-11-20 17:03:36 PST
william: no need to submit another patch... just get another reviewer to look at
your code, and then make sure to make the change i suggested before checking in.

thx!
Comment 76 Johnny Stenback (:jst, jst@mozilla.com) 2001-11-20 19:11:26 PST
What Darin said, and move the:

+    *targetSpec = nsnull;

from inside the IsAscii() check to the top of the method to make sure you never
return w/o initializing the out parameter.

r=jst
Comment 77 Wil Tan 2001-11-20 21:08:54 PST
Created attachment 58654 [details] [diff] [review]
2 changes suggested by Darin and Johnny, i.e. use newSpec.IsEmpty() and moved *targetSpec = nsnull to top.

Darin, could you checkin for me please? I haven't an account.
I'd just upload this anyway to make your job easier (hopefully it doesn't cause

even more trouble :)
Thanks!
Comment 78 Darin Fisher 2001-11-21 11:29:22 PST
sure, i'll check this in once the tree opens.
Comment 79 Darin Fisher 2001-12-07 12:34:35 PST
fixed-on-trunk
Comment 80 David Baron :dbaron: ⌚️UTC-10 2002-01-09 20:24:48 PST
This patch caused regression bug 118995.
Comment 81 nhottanscp 2002-01-14 10:05:14 PST
Typing IDN in the URL bar is currently broken (see bug 42898). Is this bug (IDN
links in a document) also broken?
Comment 82 nhottanscp 2002-01-15 11:15:32 PST
It does not work after I applied the fix for bug 42898, so I reopen this.
With the fix of bug 42898, IDN from the URL bar works but from HREF it is
escaped. Even with the other patch in bug 42898 which remove the escape
(#64946), the link still not resolved.
Comment 83 nhottanscp 2002-01-15 11:27:41 PST
In the examples in the page,
http://playground.i-dns.net/~wil/moztest/
I see at least one link is ACE encoded but the link is not found. I am not sure
the links in the page are updated.
Willam, could you applied the patches in bug 42898 and check if this bug really
exists or it's the test URL problem?
Comment 84 Wil Tan 2002-01-16 11:42:31 PST
Darin: Thanks for the fix to bug 42898. will try to update my sources and verify
the patch.
Comment 85 Wil Tan 2002-01-17 13:40:25 PST
With darin's patch to remove the nsStdEscape call in nsHTMLUtils:
http://bugzilla.mozilla.org/attachment.cgi?id=64616&action=view
It works.
Comment 86 Darin Fisher 2002-01-22 19:08:09 PST
thx wil...

-> me (to get the patch landed)
Comment 87 Wil Tan 2002-01-24 06:16:27 PST
thanks darin, you'll be landing the patch in 42898 right?

me -> work on display issue.
Comment 88 Darin Fisher 2002-01-24 20:39:42 PST
are you referring to attachment 64616 [details] [diff] [review]?  if so, then yes.
Comment 89 Darin Fisher 2002-01-29 18:48:02 PST
attachment 64616 [details] [diff] [review] has landed.
Comment 90 Darin Fisher 2002-01-29 19:06:17 PST
this bug does not belong to the networking module.  i believe that necko is
doing everything correctly now in terms of supporting IDN.  this bug needs to be
fixed outside of necko.

-> nhotta
Comment 91 nhottanscp 2002-01-30 10:06:12 PST
William, could you try the latest build? I think this is fixed by darin's check
in (attachment 64616 [details] [diff] [review]).
Comment 92 Darin Fisher 2002-01-30 12:13:09 PST
no, if you just hover over a IDN link, you'll get garbage in the status bar. 
however, that may be considered a new bug.  it's up to you :-)
Comment 93 nhottanscp 2002-01-30 17:13:17 PST
The mouse over problem is bug 81024, mark this as fixed.
Comment 94 Wil Tan 2002-02-11 20:30:58 PST
Verified fixed on Linux.
Comment 95 benc 2002-04-11 20:22:52 PDT
qa to william...

Note You need to log in before you can comment on or make changes to this bug.