Closed Bug 618876 Opened 14 years ago Closed 12 years ago

Support internationalized <input type="email">

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla13

People

(Reporter: crazy-daniel, Assigned: mounir)

References

Details

(Keywords: intl, Whiteboard: [parity-opera])

Attachments

(1 file, 2 obsolete files)

I noticed that input@type=email doesn't seem to support IDNs, for example the valid address "Max.Müller@example.org" isn't accepted as a valid mail address due to the u umlaut.
However, that string is a valid e-mail address and addresses like this get used more and more.

I'm sorry if this is too similar to issues like bug 127399 or bug 410763 (which are Thunderbird issues).
Note that this is purely a UI issue; "Max.Müller@example.org" is not an acceptable value for an email input. However, we should show "Max.Müller@example.org" while keeping "Max.xn--mller-kva@example.org" as the actual value.

Hence, sending to Firefox. Mounir, do move it to a better place if you know one.
Component: DOM: Core & HTML → General
Product: Core → Firefox
QA Contact: general → general
I'm not sure where this should be done but probably not in the Firefox component. I guess this could be done by the content so every time a value is set, it will be converted to the correct IDN string. But the editor should be able to understand that too.

Jonas, Ehsan, opinions?
Product: Firefox → Core
QA Contact: general → general
Mounir, what does the spec say about this?  We can go through hoops to make the underlying value of the editor to be the punycode and the displayed value being the Unicode value, but I think this is something that the spec needs to address.  Also, the same thing goes with <input type=url>, right?
Component: General → DOM: Core & HTML
QA Contact: general → general
Summary: <input type="email"> doesn't support IDNs → <input type="email"> and <input type="url"> don't support IDNs
(In reply to comment #3)
> Mounir, what does the spec say about this?  We can go through hoops to make the
> underlying value of the editor to be the punycode and the displayed value being
> the Unicode value, but I think this is something that the spec needs to
> address.  Also, the same thing goes with <input type=url>, right?

That's what the spec calls for.
(In reply to comment #4)
> (In reply to comment #3)
> > Mounir, what does the spec say about this?  We can go through hoops to make the
> > underlying value of the editor to be the punycode and the displayed value being
> > the Unicode value, but I think this is something that the spec needs to
> > address.  Also, the same thing goes with <input type=url>, right?
> 
> That's what the spec calls for.

Could you please post a link?
> User agents may transform the value for display and editing (e.g. converting
> punycode in the value to IDN in the display and vice versa).

<http://www.whatwg.org/html/#e-mail-state>
So, if I'm reading the spec correctly, we should return the puny code from nsHTMLInputElement::GetValue, and we can continue using the IDN variation for display and editing.  What do you think, Mounir?
Also, we should convert to punycode for validation...  It seems to me that all of the punycode conversion needs to happen in the content, and it doesn't need to change anything on the editor side.
Could we simply add two functions like:

GetDisplayValue/SetDisplayValue which puny-decodes and puny-endcodes respectively. Then whenever editor picks up the value from the input element it uses GetDisplayValue and whenever the editor wants to poke the value back into the input it uses SetDisplayValue.

That way the content side of things always deals with the value which the DOM and submission code uses. And editor always sees a user-friendly value. And those two functions handle the conversion in between.

Later we can expand those functions to deal with comma-separation issues for multiple email addresses etc.
I agree with Jonas: it would be better to have the content only dealing with puny-encoded values and have the editor requesting puny-decoded value and setting puny-encoded ones.

Do we want to fix this for Gecko 2.0?
(In reply to comment #10)
> Later we can expand those functions to deal with comma-separation issues for
> multiple email addresses etc.

I don't think we would be able to use those functions for comma-separation issues. Punycode creates a unique and reversible code but the comma-separation doesn't.
IOW: "foo@bar.com, bar@bar.com" AND "   foo@bar.com   , bar@bar.com   " will have the exact some DOM value: "foo@bar.com,bar@bar.com".
(In reply to comment #11)
> I agree with Jonas: it would be better to have the content only dealing with
> puny-encoded values and have the editor requesting puny-decoded value and
> setting puny-encoded ones.
> 
> Do we want to fix this for Gecko 2.0?

I'd say yes.  In its current form, these input fields are pretty broken for international users.
blocking2.0: --- → ?
Keywords: intl
Sicking convinced me that this shouldn't block.  Honestly, at this point, it doesn't take a lot to convince me that _any_ bug shouldn't block!  ;-)
blocking2.0: ? → ---
Do we have generic puny-encode / puny-decode methods? I didn't see anything except GetASCIIOrigin and GetUTFOrigin which are far from being generic.
nsIIDNService should have useful things on it.
Blocks: 344614
Whiteboard: [parity-opera]
Attached patch Proof of Concept (obsolete) — Splinter Review
I think we made the complex over-complicated in the comments. The easiest solution is only to transform to punycode when we want to validate the value or submit it.

A few comments about this patch:
- I think it would be better to add convertUTF8toACE and convertUTF16toACE methods taking a inout argument to nsIIDNService but I'm not sure if they are specific reasons why it hasn't been done that way initially;
- nsIIOService::NewURI might punyencode the value already. At least, <input type='url'> accepts values with UTF8 characters;
- Tests are missing.
Assignee: nobody → mounir
Status: NEW → ASSIGNED
Attachment #543018 - Flags: feedback?(jonas)
Comment on attachment 543018 [details] [diff] [review]
Proof of Concept

Review of attachment 543018 [details] [diff] [review]:
-----------------------------------------------------------------

Don't you also need to return punycode for .value? I think you can add that conversion to G/SetValue though. But I like the general approach.

::: content/base/public/nsContentUtils.h
@@ +1723,5 @@
>     */
>    static void InitializeTouchEventTable();
> +
> +  static void TransformToPunycode(nsAString& aValue);
> +  static void TransformToPunycode(nsACString& aValue);

I'd rather this didn't use in-out parameters and instead used separate in and out arguments.
Attachment #543018 - Flags: feedback?(jonas) → feedback+
This patch is a bit wrong: the specs doesn't ask us to submit punycoded value for url fields. And these fields actually handles very well UTF-8 values.
I've open bug 670883 to test UTF-8 values for <input type=url>.
Summary: <input type="email"> and <input type="url"> don't support IDNs → Support internationalized <input type="email">
I did reopen the W3 bug because I do not agree with the resolution: there is no reason to submit punycoded value for <input type='email'> when we do submit UTF-8 values for <input type='text'>. Even if SMTP servers might not accept UTF-8 email addresses, website are already used to manage that situations. No need to be over-protective I believe.

We should only allow/validate UTF-8 values.
Attached patch Patch v1Splinter Review
Puny-encode value before validating.

I will have to open a follow-up because if an email address is longer than 63 characters without "." and contains UTF-8 characters, it will not be validated because of a nsIIDNService implementation limitation related to DNS.
Attachment #543018 - Attachment is obsolete: true
Attachment #545359 - Flags: review?(jonas)
Whiteboard: [parity-opera] → [parity-opera][needs review]
sicking: ping?
Comment on attachment 545359 [details] [diff] [review]
Patch v1

Sorry about the extreme slowness :(

Hot chocolate is on me.
Attachment #545359 - Flags: review?(jonas) → review+
Flags: in-testsuite+
Whiteboard: [parity-opera][needs review] → [parity-opera]
Target Milestone: --- → mozilla13
Attachment #545359 - Flags: checkin+
https://hg.mozilla.org/mozilla-central/rev/34d97151ab88
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED

This was fixed for domains in 2011; this patch adds support for unicode in
localparts.

https://github.com/whatwg/html/issues/4562 contains relevant discussion.

(Note that punycode is only defined for domains; you're not permitted to
assume anything about other people's localparts. In particular you're not
permitted to assume that grå@... is equivalent to xn--gr-zia@...)

A patch has been attached on this bug, which was already closed. Filing a separate bug will ensure better tracking. If this was not by mistake and further action is needed, please alert the appropriate party. (Or: if the patch doesn't change behavior -- e.g. landing a test case, or fixing a typo -- then feel free to disregard this message)

Please file a new bug for this patch.

Flags: needinfo?(arnt)

Comment on attachment 9329911 [details]
Bug 618876 - support internationalized <input type="email"> r=mkmelin

Revision D176259 was moved to bug 1829657. Setting attachment 9329911 [details] to obsolete.

Attachment #9329911 - Attachment is obsolete: true
Flags: needinfo?(arnt)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: