Last Comment Bug 618876 - Support internationalized <input type="email">
: Support internationalized <input type="email">
Status: RESOLVED FIXED
[parity-opera]
: intl
Product: Core
Classification: Components
Component: DOM: Core & HTML (show other bugs)
: Trunk
: All All
: -- normal with 3 votes (vote)
: mozilla13
Assigned To: Mounir Lamouri (:mounir)
:
Mentors:
: 623120 (view as bug list)
Depends on:
Blocks: html5forms 555559
  Show dependency treegraph
 
Reported: 2010-12-13 13:00 PST by Daniel.S
Modified: 2012-03-01 05:58 PST (History)
14 users (show)
mounir: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Proof of Concept (6.30 KB, patch)
2011-06-29 17:34 PDT, Mounir Lamouri (:mounir)
jonas: feedback+
Details | Diff | Review
Patch v1 (8.77 KB, patch)
2011-07-12 04:53 PDT, Mounir Lamouri (:mounir)
jonas: review+
mounir: checkin+
Details | Diff | Review

Description Daniel.S 2010-12-13 13:00:49 PST
I noticed that input@type=email doesn't seem to support IDNs, for example the valid address "Max.Müller@example.org" isn't accepted as a valid mail address due to the u umlaut.
However, that string is a valid e-mail address and addresses like this get used more and more.

I'm sorry if this is too similar to issues like bug 127399 or bug 410763 (which are Thunderbird issues).
Comment 1 :Ms2ger 2010-12-13 13:30:07 PST
Note that this is purely a UI issue; "Max.Müller@example.org" is not an acceptable value for an email input. However, we should show "Max.Müller@example.org" while keeping "Max.xn--mller-kva@example.org" as the actual value.

Hence, sending to Firefox. Mounir, do move it to a better place if you know one.
Comment 2 Mounir Lamouri (:mounir) 2010-12-13 14:27:06 PST
I'm not sure where this should be done but probably not in the Firefox component. I guess this could be done by the content so every time a value is set, it will be converted to the correct IDN string. But the editor should be able to understand that too.

Jonas, Ehsan, opinions?
Comment 3 :Ehsan Akhgari (busy, don't ask for review please) 2010-12-14 17:11:57 PST
Mounir, what does the spec say about this?  We can go through hoops to make the underlying value of the editor to be the punycode and the displayed value being the Unicode value, but I think this is something that the spec needs to address.  Also, the same thing goes with <input type=url>, right?
Comment 4 :Ms2ger 2010-12-15 00:23:48 PST
(In reply to comment #3)
> Mounir, what does the spec say about this?  We can go through hoops to make the
> underlying value of the editor to be the punycode and the displayed value being
> the Unicode value, but I think this is something that the spec needs to
> address.  Also, the same thing goes with <input type=url>, right?

That's what the spec calls for.
Comment 5 :Ehsan Akhgari (busy, don't ask for review please) 2010-12-15 00:41:27 PST
(In reply to comment #4)
> (In reply to comment #3)
> > Mounir, what does the spec say about this?  We can go through hoops to make the
> > underlying value of the editor to be the punycode and the displayed value being
> > the Unicode value, but I think this is something that the spec needs to
> > address.  Also, the same thing goes with <input type=url>, right?
> 
> That's what the spec calls for.

Could you please post a link?
Comment 6 :Ms2ger 2010-12-15 04:59:58 PST
> User agents may transform the value for display and editing (e.g. converting
> punycode in the value to IDN in the display and vice versa).

<http://www.whatwg.org/html/#e-mail-state>
Comment 7 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-01-04 19:58:13 PST
*** Bug 623120 has been marked as a duplicate of this bug. ***
Comment 8 :Ehsan Akhgari (busy, don't ask for review please) 2011-01-07 11:19:44 PST
So, if I'm reading the spec correctly, we should return the puny code from nsHTMLInputElement::GetValue, and we can continue using the IDN variation for display and editing.  What do you think, Mounir?
Comment 9 :Ehsan Akhgari (busy, don't ask for review please) 2011-01-07 11:21:03 PST
Also, we should convert to punycode for validation...  It seems to me that all of the punycode conversion needs to happen in the content, and it doesn't need to change anything on the editor side.
Comment 10 Jonas Sicking (:sicking) PTO Until July 5th 2011-01-07 18:15:41 PST
Could we simply add two functions like:

GetDisplayValue/SetDisplayValue which puny-decodes and puny-endcodes respectively. Then whenever editor picks up the value from the input element it uses GetDisplayValue and whenever the editor wants to poke the value back into the input it uses SetDisplayValue.

That way the content side of things always deals with the value which the DOM and submission code uses. And editor always sees a user-friendly value. And those two functions handle the conversion in between.

Later we can expand those functions to deal with comma-separation issues for multiple email addresses etc.
Comment 11 Mounir Lamouri (:mounir) 2011-01-08 05:15:20 PST
I agree with Jonas: it would be better to have the content only dealing with puny-encoded values and have the editor requesting puny-decoded value and setting puny-encoded ones.

Do we want to fix this for Gecko 2.0?
Comment 12 Mounir Lamouri (:mounir) 2011-01-08 05:17:35 PST
(In reply to comment #10)
> Later we can expand those functions to deal with comma-separation issues for
> multiple email addresses etc.

I don't think we would be able to use those functions for comma-separation issues. Punycode creates a unique and reversible code but the comma-separation doesn't.
IOW: "foo@bar.com, bar@bar.com" AND "   foo@bar.com   , bar@bar.com   " will have the exact some DOM value: "foo@bar.com,bar@bar.com".
Comment 13 :Ehsan Akhgari (busy, don't ask for review please) 2011-01-10 10:39:40 PST
(In reply to comment #11)
> I agree with Jonas: it would be better to have the content only dealing with
> puny-encoded values and have the editor requesting puny-decoded value and
> setting puny-encoded ones.
> 
> Do we want to fix this for Gecko 2.0?

I'd say yes.  In its current form, these input fields are pretty broken for international users.
Comment 14 :Ehsan Akhgari (busy, don't ask for review please) 2011-01-10 15:05:48 PST
Sicking convinced me that this shouldn't block.  Honestly, at this point, it doesn't take a lot to convince me that _any_ bug shouldn't block!  ;-)
Comment 15 Mounir Lamouri (:mounir) 2011-01-21 10:22:51 PST
Do we have generic puny-encode / puny-decode methods? I didn't see anything except GetASCIIOrigin and GetUTFOrigin which are far from being generic.
Comment 16 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-01-21 10:35:15 PST
nsIIDNService should have useful things on it.
Comment 17 Mounir Lamouri (:mounir) 2011-06-29 17:34:53 PDT
Created attachment 543018 [details] [diff] [review]
Proof of Concept

I think we made the complex over-complicated in the comments. The easiest solution is only to transform to punycode when we want to validate the value or submit it.

A few comments about this patch:
- I think it would be better to add convertUTF8toACE and convertUTF16toACE methods taking a inout argument to nsIIDNService but I'm not sure if they are specific reasons why it hasn't been done that way initially;
- nsIIOService::NewURI might punyencode the value already. At least, <input type='url'> accepts values with UTF8 characters;
- Tests are missing.
Comment 18 Jonas Sicking (:sicking) PTO Until July 5th 2011-06-29 17:51:25 PDT
Comment on attachment 543018 [details] [diff] [review]
Proof of Concept

Review of attachment 543018 [details] [diff] [review]:
-----------------------------------------------------------------

Don't you also need to return punycode for .value? I think you can add that conversion to G/SetValue though. But I like the general approach.

::: content/base/public/nsContentUtils.h
@@ +1723,5 @@
>     */
>    static void InitializeTouchEventTable();
> +
> +  static void TransformToPunycode(nsAString& aValue);
> +  static void TransformToPunycode(nsACString& aValue);

I'd rather this didn't use in-out parameters and instead used separate in and out arguments.
Comment 19 Mounir Lamouri (:mounir) 2011-07-12 02:35:22 PDT
This patch is a bit wrong: the specs doesn't ask us to submit punycoded value for url fields. And these fields actually handles very well UTF-8 values.
I've open bug 670883 to test UTF-8 values for <input type=url>.
Comment 20 Mounir Lamouri (:mounir) 2011-07-12 02:53:06 PDT
I did reopen the W3 bug because I do not agree with the resolution: there is no reason to submit punycoded value for <input type='email'> when we do submit UTF-8 values for <input type='text'>. Even if SMTP servers might not accept UTF-8 email addresses, website are already used to manage that situations. No need to be over-protective I believe.

We should only allow/validate UTF-8 values.
Comment 21 Mounir Lamouri (:mounir) 2011-07-12 04:53:42 PDT
Created attachment 545359 [details] [diff] [review]
Patch v1

Puny-encode value before validating.

I will have to open a follow-up because if an email address is longer than 63 characters without "." and contains UTF-8 characters, it will not be validated because of a nsIIDNService implementation limitation related to DNS.
Comment 22 :Ehsan Akhgari (busy, don't ask for review please) 2011-09-15 16:20:06 PDT
sicking: ping?
Comment 23 Jonas Sicking (:sicking) PTO Until July 5th 2012-02-26 21:19:20 PST
Comment on attachment 545359 [details] [diff] [review]
Patch v1

Sorry about the extreme slowness :(

Hot chocolate is on me.
Comment 24 Marco Bonardo [::mak] 2012-03-01 05:58:38 PST
https://hg.mozilla.org/mozilla-central/rev/34d97151ab88

Note You need to log in before you can comment on or make changes to this bug.