Open Bug 706691 Opened 13 years ago Updated 2 years ago

Use separate types for ASCII vs punycode vs UTF-8 strings, especially for hostnames

Categories

(Core :: Networking, defect, P5)

defect

Tracking

()

People

(Reporter: briansmith, Unassigned)

Details

(Whiteboard: [necko-would-take])

+++ This bug was initially created as a clone of Bug #703508 +++

In bug 703508 comment 10, Kai noticed that the character encoding of nsNSSSocketInfo::mHostName is unclear. When we store hostnames in strings in Necko and PSM, we should make the type of encoding we are expecting unambiguous--e.g. we should have a "punycode" string type.

Is it the case that all the non-UTF8 8-bit strings used for hostnames in Necko are considered punycode? It seems like we shouldn't have *any* code that is ASCII-but-not-to-be-interpreted-as-punycode, because such code wouldn't support IDNs at all.
Necko itself always stores hostnames as punycode or as the original UTF-8 strings (which generally are normalized to UTF-8 even when input as punycode originally). Where would any other encodings come from?
(In reply to Christian :Biesinger (don't email me, ping me on IRC) from comment #1)
> Necko itself always stores hostnames as punycode or as the original UTF-8
> strings (which generally are normalized to UTF-8 even when input as punycode
> originally). 

Good to hear.

> Where would any other encodings come from?

I don't know. My main point is that "ns[A]CString & hostname" doesn't scream "punycode," which leads to confusion and doubt like in Kai's review of my patches in bug 703508 and bug 674147.
Whiteboard: [necko-would-take]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.