Closed Bug 196717 Opened 21 years ago Closed 21 years ago

Decode ACE to UTF-8

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: nhottanscp, Assigned: nhottanscp)

Details

Attachments

(2 files, 1 obsolete file)

Decoding ACE to UTF-8, nsIIDNService::ConvertACEtoUTF8 is not implemented.

darin, what kind of situation we get ACE from server?
lot's of cases...

 - HTTP redirect
 - cookies
 - HTML content

web content in general can rightfully contain ACE encoded hostnames. 
nsStandardURL simply needs to always check the hostname to see if it matches an
ACE encoding.  seems to me that nsIIDNService should have a method to check if a
string is ACE encoded or not.  because if not, then it is cheaper to avoid the
buffer copy implicit in nsIIDNService::ConvertACEtoUTF8.
yes, we can check the prefix "xn--"
Attachment #117922 - Flags: review?(shanjian)
Comment on attachment 117922 [details] [diff] [review]
Implements ACE to UTF-8 and modified the test program.

looks fine.
Attachment #117922 - Flags: review?(shanjian) → review+
Attachment #117922 - Flags: superreview?(darin)
Comment on attachment 117922 [details] [diff] [review]
Implements ACE to UTF-8 and modified the test program.

>Index: dns/src/nsIDNService.cpp

>+  if (!IsASCII(input)) {
>+    _retval.Assign(input);
>+    return NS_OK;
>+  }

hmm... i'm not sure we should be checking this here.  it seems like
added overhead that shouldn't be needed.  it should be enough to just
scan the first bytes for the ACE prefix.

remember: this function is going tobe called a lot, so it is worth it
to spend the time to carefully optimize it.


>+  // RFC 3490 - 4.2 ToUnicode
>+  // The ToUnicode output never contains more code points than its input.
>+  PRInt32 prefixLen = strlen(mACEPrefix);

strlen(mACEPrefix) == 4 ... why not just hardcode the value?  or
use a #define at least.  strlen is overkill.


>+                                                PromiseFlatCString(
>+                                                  Substring(
>+                                                            in, 
>+                                                            prefixLen, 
>+                                                            in.Length() - prefixLen)).get(),

isn't this equivalent to:

  in.get() + 4

provided you declare |in| to be |const nsAFlatCString&| or |const nsCString&|.
PromiseFlatCString is expensive here since it will heap allocate and copy the
input string.  avoid if possible.


>+  // UCS4 -> UTF8
>+  output[output_length] = 0;
>+  nsAutoString utf16;
>+  ucs4toUtf16(output, utf16);
>+  delete [] output;
>+  out.Assign(NS_ConvertUCS2toUTF8(utf16));

seems like there should be a common routine for this conversion someplace.
are you duplicating code that could be factored into a common subroutine?


>+  // Validation: encode back to ACE and compare the strings

ouch!  is this strictly required?  i guess so, but it is going to make
the conversion from ACE to UTF-8 much more expensive.  can't we just
assume the input is valid?  can this validation step be DEBUG only?
Attachment #117922 - Flags: superreview?(darin) → superreview-
>+  if (!IsASCII(input)) {
>+    _retval.Assign(input);
>+    return NS_OK;
>+  }
This is based on RFC 3490 4.2.1. The spec seems to be allowing non-Ascii (4.2.2)
which we do not support but we need to check that case.
The prefix check is required after the Ascii check, I forgot to put that but I
need to add the prefix check.
I think the caller may call IsACE() before calling ConvertACEtoUTF8() for
performance.

>+  PRInt32 prefixLen = strlen(mACEPrefix);
This is because the prefix is also specified by pref. But since we only support
"xn--" and "bq--" anyway, we can change it to a constant. I will do the change.

>are you duplicating code that could be factored into a common subroutine?
No, that is specific to that code. There is other code which converts UCS4 to
UTF-16 but not to UTF-8.

>+  // Validation: encode back to ACE and compare the strings
Yes, that is required (RFC 3490 4.2.6-7). I think it is better to follow the
spec at least for now since we may get invalid domain names and it is safter to
do the required check.


>I think the caller may call IsACE() before calling ConvertACEtoUTF8() 
>for performance.
But the case like www.xn--ENCODED.com, IsACE() returns false for that. 
I think IsACE() has to change to search for "xn--" not only for the beginning of
the input string. (but I have no idea how to do that for nsACString).

>+  PRInt32 prefixLen = strlen(mACEPrefix);
This is because the prefix is also specified by pref. But since we only support
"xn--" and "bq--" anyway, we can change it to a constant. I will do the change.

remember you made IsACE hardcode 4.  if it is variable, then you should use
strlen.  i didn't realize it was pref controlled.
That was my mistake using 4 there. But we can make it constant now since "xn--"
is official and we only support "xn--" and "bq--".
Attachment #118341 - Flags: superreview?(darin)
does our pref checking code ensure that mACEPrefix will always be 4 characters
in length?
yes, the code checks the length
http://lxr.mozilla.org/seamonkey/source/netwerk/dns/src/nsIDNService.cpp#89
 89       rv = prefBranch->GetCharPref("network.IDN_prefix", getter_Copies(prefix));
 90       if (NS_SUCCEEDED(rv) && 
 91         prefix.Length() <= kACEPrefixLen) {
 92         strncpy(mACEPrefix, prefix.get(), kACEPrefixLen);
 93         mACEPrefix[sizeof(mACEPrefix)-1] = '\0';
 94       }
Comment on attachment 118341 [details] [diff] [review]
Use constant value for prefix length, changed IsACE to handle entire domain name as an input.

>Index: dns/src/nsIDNService.cpp

might be good to move decl of nsDependentCString(mACEPrefix, ...) out:

    nsDependentCString prefix(mACEPrefix, kACEPrefixLen);

then you won't have to instantiate it twice.


>+  // also check for the case like "www.xn--ENCODED.com"
>+  // in case for this is called for an entire domain name

what about "aaa.bbb.xn--bar.com"?


nothing show-stopping...  the code has a lot of buffer copies,
but i don't see a quick way to eliminate those.
>what about "aaa.bbb.xn--bar.com"?
that is also taken care by searching for "." + prefix
Attachment #118442 - Flags: superreview?(darin)
Comment on attachment 118442 [details] [diff] [review]
nsDependentCString change by darin's suggestion.

sr=darin (ok)
Attachment #118442 - Flags: superreview?(darin) → superreview+
checked in to the trunk

darin, who should call this decode function, shall I file a separate bug for that?
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
nsStandardURL will need to call this decode function.  yes, a separate bug would
be best.
filed bug 199229 for the caller side implementation
Attachment #118341 - Flags: superreview?(darin)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: