Closed Bug 237819 Opened 21 years ago Closed 21 years ago

nsStandardURL should automatically convert ACE hostnames to UTF8

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla1.8alpha2

People

(Reporter: darin.moz, Assigned: darin.moz)

References

Details

(Whiteboard: [IDN])

Attachments

(1 file, 1 obsolete file)

v1 ptach 21 years ago Darin Fisher 6.29 KB, patch		Details \| Diff \| Splinter Review
v1.1 patch 21 years ago Darin Fisher 7.71 KB, patch	Biesinger : review+ dbaron : superreview+	Details \| Diff \| Splinter Review

Darin Fisher

Assignee

Description

•

21 years ago

nsStandardURL should automatically convert ACE hostnames to UTF8. If given an ACE hostname, nsStandardURL (::BuildNormalizedSpec, etc.) should convert the ACE hostname to UTF8 and expose the UTF8 value via its host, hostPort, prePath, and spec attributes. We can make this efficient by using nsIIDNService::IsACE

Darin Fisher

Assignee

Updated

•

21 years ago

Status: NEW → ASSIGNED

Whiteboard: [IDN]

Target Milestone: --- → mozilla1.8alpha

Darin Fisher

Assignee

Updated

•

21 years ago

Blocks: 237820

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 1

•

21 years ago

[ This comment is in UTF-8. ] It's worth noting that the "Host: " field in the "Details" part of the cookie confirmation dialog can currently show up in ACE, e.g., for http://www.öko.de/ .

Darin Fisher

Assignee

Updated

•

21 years ago

Blocks: 238046

Darin Fisher

Assignee

Comment 2

•

21 years ago

dbaron: yeah, i think that deserves a separate bug though. we need to go through cookies and add IsACE + conversions to UTF-8 in places that expose hostnames to UI. i think this bug is about making sure that any nsStandardURL constructed with an ACE hostname normalizes the URL's hostname field to the UTF-8 form. in particular this bug addresses some of the ACE "ui-leakage" that can occur when a site redirects the browser to a URL with an IDN hostname. patching coming up...

Darin Fisher

Assignee

Comment 3

•

21 years ago

Attached patch v1 ptach (obsolete) — Details — Splinter Review

This seems to do the trick.

Darin Fisher

Assignee

Updated

•

21 years ago

Attachment #147854 - Flags: superreview?(dbaron)

Attachment #147854 - Flags: review?(jshin)

Darin Fisher

Assignee

Updated

•

21 years ago

Blocks: 243498

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 4

•

21 years ago

Comment on attachment 147854 [details] [diff] [review] v1 ptach My one issue with this patch is that it changes things to that if there's no IDN service or if gIDNService->Normalize fails (which can happen in response to user input), mHostEncoding gets set to ASCII even when the encoding isn't ascii. That might be a problem. (We don't want a URL that doesn't meet the Bidi requirements for IDN normalization to be corrupted -- we want to allow the user to correct it when typing in the URL bar. And perhaps other potential issues.)

Darin Fisher

Assignee

Comment 5

•

21 years ago

That's an excellent point dbaron. I will revise the patch. Thanks!

Darin Fisher

Assignee

Updated

•

21 years ago

Attachment #147854 - Attachment is obsolete: true

Attachment #147854 - Flags: superreview?(dbaron)

Attachment #147854 - Flags: review?(jshin)

Darin Fisher

Assignee

Comment 6

•

21 years ago

Attached patch v1.1 patch — Details — Splinter Review

Here's a better patch. It's not great. I'm not terribly happy with NormalizeIDN since it now modifies mHostEncoding directly. I'd prefer if it were a real subroutine with no side-effects, but anyways it already was pretty specific to the callsites. It only really exists as a way to share some code, so I decided to just accept the side-effect of setting mHostEncoding. Maybe there is a better way, but this seems to do the trick. No need for too much fancy code factoring here.

Darin Fisher

Assignee

Updated

•

21 years ago

Attachment #150317 - Flags: superreview?(dbaron)

Attachment #150317 - Flags: review?(cbiesinger)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

21 years ago

Attachment #150317 - Flags: superreview?(dbaron) → superreview+

Darin Fisher

Assignee

Updated

•

21 years ago

Target Milestone: mozilla1.8alpha1 → mozilla1.8alpha2

Christian :Biesinger (don't email me, ping me on IRC)

Comment 7

•

21 years ago

Comment on attachment 150317 [details] [diff] [review] v1.1 patch + PRBool EscapeIPv6(const char *host, nsCString &result); + PRBool NormalizeIDN(const nsCSubstring &host, nsCString &result); maybe you could document what NormalizeIDN does? especially the meaning of the return value... it seems it returns PR_TRUE when it "did something"? + // NOTE: As a side-effect this function sets mHostEncoding. maybe the encoding should be an out parameter (by reference)? + nsCAutoString hostBuf; + if (EscapeIPv6(host, hostBuf)) { + host = hostBuf.get(); + len = hostBuf.Length(); + } + else if (NormalizeIDN(flat, hostBuf)) { + host = hostBuf.get(); + len = hostBuf.Length(); } hm... why does this function (SetHost) not call the ACString& version of ReplaceSegment? Then you can get rid of these assignments to host and len, and maybe pass escapedHost directly to ReplaceSegment; although you'd have to assign host to hostBuf in the else-case. r=me with or without these changes.

Attachment #150317 - Flags: review?(cbiesinger) → review+

Darin Fisher

Assignee

Comment 8

•

21 years ago

> maybe you could document what NormalizeIDN does? especially the meaning of the > return value... it seems it returns PR_TRUE when it "did something"? yeah, i have it on my todo list to document all the functions. it returns true when it puts something in the result buffer. > maybe the encoding should be an out parameter (by reference)? i almost did this, but it seemed overkill given that the function already knows a lot about the callsite. also mHostEncoding is a 2-bit wide bit-field, so i can't pass it into NormalizeIDN. i'd have to pass in a temporary value, and then i'd have to assign that temporary into mHostEncoding. it didn't seem worth it to me. > why does this function (SetHost) not call the ACString& version of > ReplaceSegment? <snip> > although you'd have to assign host to hostBuf in the else-case. exactly. i'm avoiding the buffer copy. the else-case is the most common case.

Darin Fisher

Assignee

Comment 9

•

21 years ago

fixed-on-trunk

Status: ASSIGNED → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

benc

Comment 10

•

20 years ago

V/lxr: http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsStandardURL.h&branch=&root=/cvsroot&subdir=mozilla/netwerk/base/src&command=DIFF_FRAMESET&rev1=1.19&rev2=1.20 http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsStandardURL.cpp&branch=&root=/cvsroot&subdir=mozilla/netwerk/base/src&command=DIFF_FRAMESET&rev1=1.62&rev2=1.63

Status: RESOLVED → VERIFIED

Darin Fisher

Assignee

Updated

•

20 years ago

Blocks: 199229

Adam M. Costello

Comment 11

•

20 years ago

NormalizeIDN() contains the comment "normalization could result in an ASCII only hostname". That's true, but there's a further implication that is neglected: The resulting ASCII-only hostname could be an ACE. I assume the intention of NormalizeIDN() is never to return an ACE (unless the show_punycode pref forces it). The ToUnicode operation in RFC-3490 handles this case. The mozilla source code does not seem to contain an implementation ToUnicode. The comment for convertACEtoUTF8() says "This is the ToUnicode operation", but it's not: it's missing the first two steps, and it operates on entire domain names, whereas ToUnicode operates on individual labels. Hmmm, if a domain name contains some ACEs and at least one bogo-ACE (something that begins with the ACE prefix but is not a valid ACE), none of the valid ACEs will get converted. It's not clear to me what is the underlying intention behind NormalizeIDN(). If the input is "FooBar", then it will be left untouched, in mixed-case, but if the input has an accent over the 'a', then it will be Nameprep'd and hence forced to all-lower-case. If the goal is a canonical form, then you need to lowercase even pure ASCII inputs. If the goal is not a canonical form, then why bother with Nameprep for non-ASCII inputs? Or is the goal a semi-canonical form that can be compared using a comparison function that knows the ASCII case pairs but no others? If one goal is human readability, then you want to avoid returning an ACE. The ToUnicode operation is designed to accomplish exactly that: it takes any domain label, in any form, and always returns an equivalent label that is not an ACE. If, in addition to being non-ACE, you also want a canonical form (that is, one that can be compared using a byte-wise comparison), then you want to apply both ToUnicode and Nameprep, in either order (the result will be the same). If Nameprep is applied before ToUnicode, then the first two steps of ToUnicode will make no difference and can be optimized away. By the way, the comment for isACE() is somewhat misleading (as is the name of the function). As implemented, it does not check whether the input is an ACE (or is "ACE encoded"); it checks for the presence of the ACE prefix. All ACEs begin with the ACE prefix, but not all strings beginning with the ACE prefix are ACEs (as stated in section 5 of RFC-3490). I understand that this function is intended only for performance optimization, so it's okay to calculate a loose boundary around the set of ACEs, but it would be good for the comment to more accurately describe the behavior.

You need to log in before you can comment on or make changes to this bug.