Closed Bug 237818 Opened 20 years ago Closed 20 years ago

nsStandardURL should normalize UTF-8 hostnames

Categories

(Core :: Networking, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.8alpha1

People

(Reporter: darin.moz, Assigned: dbaron)

References

Details

(Keywords: intl, Whiteboard: [IDN][patch])

Attachments

(2 files, 1 obsolete file)

nsStandardURL should normalize UTF-8 hostnames.

BuildNormalizedSpec, SetHost, and any other methods that modify the host field
of nsStandardURL should normalize the given value if it is UTF-8.

We might want to add a NormalizeUTF8 method on nsIIDNService.
Status: NEW → ASSIGNED
Whiteboard: [IDN]
Target Milestone: --- → mozilla1.8alpha
Blocks: IDN
> We might want to add a NormalizeUTF8 method on nsIIDNService

Can we just use nsIUnicodeNormalizer? There may be some issues to think about,
though.
Keywords: intl
> Can we just use nsIUnicodeNormalizer? There may be some issues to think about,
> though.

We could do that, but personally I think we should use the same normalization
algorithm used by nsIIDNService.  (Maybe nsIUnicodeNormalizer is what
nsIIDNService already uses?? -- I thought IDN required special normalization
rules.)  Whatever works! :-)
Yeah, I vaguely remembered that IDN has its own (slightly different?)
normalization rules so that I wrote an 'escape clause' ("there may be issues to
think about, though") :-) We have to read the IDN standard document :-)
We currently just use the normalizer, it seems:

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/netwerk/dns/src/nsIDNService.cpp&rev=1.17&mark=393#351

Since we probably don't want to rip huge pieces of a URL out if the user enters
it and it's invalid according to IDN (so they can fix their typo, say?), perhaps
just normalization is best at this stage -- since it seems to be the only thing
we do other than check for invalid things.
Er, except that the "Map" bit is probably doing something. :-)
Comment on attachment 144199 [details] [diff] [review]
possible patch

first glance looks right on to me... i'll take a closer look tomorrow.	thx
dbaron!
Attachment #144199 - Flags: review?(darin)
Comment on attachment 144199 [details] [diff] [review]
possible patch

>Index: base/src/nsStandardURL.cpp

>+    // However, perform Unicode normalization on it, as IDN does.
>+    mHostEncoding = eEncoding_ASCII;
>+    if (mHost.mLen > 0) {
>+        const nsACString& tempHost =
>+            Substring(spec + mHost.mPos, spec + mHost.mPos + mHost.mLen);
>+        if (IsASCII(tempHost)) {
>+            approxLen += mHost.mLen;
>+        } else {
>+            mHostEncoding = eEncoding_UTF8;
>+            if (gIDNService &&
>+                NS_SUCCEEDED(gIDNService->Normalize(tempHost, encHost))) {
>+                approxLen += encHost.Length();
>+            } else {
>+                encHost.Truncate();
>+                approxLen += mHost.mLen;
>+            }
>+        }
>+    }

only some nits here...

(1) this file avoids brackets when not needed

(2) should not be necessary to Truncate encHost in the failure case.
    an XPCOM method is not supposed to set an out-param value when it
    fails.  plus, it looks like nsIDNService::Normalize isn't going to
    cause any trouble in this regard.


> nsStandardURL::GetAsciiHost(nsACString &result)
> {
>-    if (mHostEncoding == eEncoding_Unknown) {
>-        if (IsASCII(Host()))
>-            mHostEncoding = eEncoding_ASCII;
>-        else
>-            mHostEncoding = eEncoding_UTF8;
>-    }
>+    NS_ASSERTION(mHostEncoding != eEncoding_Unknown, "encoding should be set");

my only concern here is that someone could construct a nsStandardURL via
CreateInstance.  they could decide not to initialize the nsStandardURL.
should we really assert in that case?  perhaps we should initialize
mHostEncoding to eEncoding_ASCII in the nsStandardURL ctor.  it should
be okay to regard an empty hostname as ASCII.  then again, i'm not sure
that the rest of nsStandardURL works that well when not initialized :-/


sr=darin
Attachment #144199 - Flags: superreview+
Attachment #144199 - Flags: review?(jshin)
Attachment #144199 - Flags: review?(darin)
(In reply to comment #9)
> my only concern here is that someone could construct a nsStandardURL via
> CreateInstance.  they could decide not to initialize the nsStandardURL.

personally, I think they deserve whatever they get :)
Attached patch patchSplinter Review
This fixes the two numbered comments and changes where mHostEncoding is set (so
it's never Unknown).
Attachment #144199 - Attachment is obsolete: true
Attachment #144426 - Flags: superreview?(darin)
Attachment #144426 - Flags: review?(jshin)
Comment on attachment 144426 [details] [diff] [review]
patch

Here we assert.

>+            if (gIDNService &&
>+                NS_SUCCEEDED(gIDNService->Normalize(tempHost, encHost)))
>+                approxLen += encHost.Length();
>+            else {
>+                NS_ASSERTION(encHost.IsEmpty(),
>+                             "Normalize failed by modified encHost");

BTW, it's not documented in the idl file that the out parameter is empty when
normalize fails (although the only implementation does that)...

>+                approxLen += mHost.mLen;

However, we don't here. 

>+        if (!IsASCII(flat)) {
>+            mHostEncoding = eEncoding_UTF8;
>+            if (gIDNService &&
>+                NS_SUCCEEDED(gIDNService->Normalize(flat, escapedHost))) {
>+                host = escapedHost.get();
>+                len = escapedHost.Length();


>+NS_IMETHODIMP nsIDNService::Normalize(const nsACString & input, nsACString & output)
>+{
>+  nsAutoString outUTF16;
>+  nsresult rv = stringPrep(NS_ConvertUTF8toUTF16(input), outUTF16);
>+  if (NS_SUCCEEDED(rv))
>+    CopyUTF16toUTF8(outUTF16, output);
>+  return rv;

How about moving the assertion here? Isn't stringPrep always supposed to
succeeed? Alternatively, we can assert at both places.

r=jshin
Attachment #144426 - Flags: review?(jshin) → review+
Comment on attachment 144426 [details] [diff] [review]
patch

>+                             "Normalize failed by modified encHost");

s/by/but/


sr=darin
Attachment #144426 - Flags: superreview?(darin) → superreview+
Assignee: darin → dbaron
Status: ASSIGNED → NEW
Priority: -- → P1
Whiteboard: [IDN] → [IDN][patch]
(In reply to comment #12)
> BTW, it's not documented in the idl file that the out parameter is empty when
> normalize fails (although the only implementation does that)...

darin said in comment 9 that that's implied, although I'd never heard that rule
before.

> >+                approxLen += mHost.mLen;
> 
> However, we don't here. 

Don't what?

> How about moving the assertion here? Isn't stringPrep always supposed to
> succeeed? Alternatively, we can assert at both places.

No, stringPrep can fail for some cases of input text (I think there are some
rules on bidi text and an "isprohibited" function) -- and I'd rather leave the
URL unmodified in such cases so the user can correct it.  Presumably the failure
is handled by the DNS lookup code in a way that causes the hostname lookup to
fail.  I should probably write a testcase for this, though...
Comment on attachment 144426 [details] [diff] [review]
patch

Requesting approval.  This should be pretty safe, and it fixes what could be
perceived as a minor security problem (strange things could appear in the URL
bar for well-known sites).
Attachment #144426 - Flags: approval1.7?
Comment on attachment 144426 [details] [diff] [review]
patch

a=chofmann for 1.7
Attachment #144426 - Flags: approval1.7? → approval1.7+
Fix checked in to trunk, 2004-04-02 23:32 -0800.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Verified with windows 1.0.7, 1.7.12, Mac 1.0.7, 1.7.12
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: