Closed Bug 1064430 Opened 10 years ago Closed 9 years ago

input[email] sets input.validity.badInput, which is should not

Categories

(Core :: DOM: Core & HTML, defect)

35 Branch
x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla38

People

(Reporter: tbosch, Assigned: jwatt)

References

Details

Attachments

(3 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.44 Safari/537.36

Steps to reproduce:

Create a form with `input[email]`, then enter an invalid email address, e.g. 'invalid-email`.


Actual results:

both, `element.validity.badInput` and `element.validity.typeMismatch` are set to `true`


Expected results:

Only `element.validity.typeMismatch` should be set to true, but `input.validity.badInput` should be false.

Spec http://www.whatwg.org/specs/web-apps/current-work/multipage/forms.html#e-mail-state-(type=email):
Bad input:
> While the user interface is representing input that the user agent
> cannot convert to punycode, the control is suffering from bad input.

Type mismatch:
> While the value of the element is neither the empty string nor a
> single valid e-mail address, the element is suffering from a type mismatch.
Here is a jsfiddle showing the problem: http://jsfiddle.net/qt17xe79/1/
Blocks: 827161
Component: Untriaged → DOM
Product: Firefox → Core
HTMLInputElement::HasBadInput does:

6576       if (!PunycodeEncodeEmailAddress(tokenizer.nextToken(), unused, &unused2)) {
6577         return true;
6578       }

and PunycodeEncodeEmailAddress does:

6506   uint32_t atPos = (uint32_t)value.FindChar('@');
6507   // Email addresses must contain a '@', but can't begin or end with it.
6508   if (atPos == (uint32_t)kNotFound || atPos == 0 || atPos == length - 1) {
6509     return false;
6510   }

Looking at the spec, it doesn't actually define what it means by "convert to punycode".  PunycodeEncodeEmailAddress has explicit comments about needing to split on the '@' and encode the username and domain as separate labels, which is why it's looking for the '@'.

It's hard to tell whether the behavior we have is a bug or not given the spec's lack of definition of "convert to punycode".  :(
Jonathan, Olli, what's the story with the '@' handling here?

I just tried a few different utilities on the web that claim to convert things to punycode.  Some of them just pass the '@' through.  Some barf on it, claiming the input can't be converted to punycode...
Flags: needinfo?(jwatt)
Flags: needinfo?(bugs)
I can't remember the reason for the @ check.

Filed spec bug https://www.w3.org/Bugs/Public/show_bug.cgi?id=26812
Flags: needinfo?(bugs)
I don't have anything on this.
Flags: needinfo?(jwatt)
Jonathan, you wrote this code....  I was sort of hoping you remembered why you went to the trouble of splitting on the '@' here.  Can we just remove it?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(jwatt)
I didn't come up with that behavior I'm afraid. It came from the existing code that I refactored:

https://bugzilla.mozilla.org/attachment.cgi?id=8369128&action=diff

The comment for PunycodeEncodeEmailAddress which says:

 * This function exists because ConvertUTF8toACE() treats 'username@domain' as
 * a single label, but we need to encode the username and domain parts
 * separately.

is also a refactoring of a comment in that preexisting code.
Flags: needinfo?(jwatt)
Ah, I see.

How about we remove that and see whether any tests fail?
(So the code is from bug 884332)
So if passed "user.name@foo.bar.baz", nsIDNService::ConvertUTF8toACE will puny-encode "user", "name@foo", "bar" and "baz" separately. Treating "name@foo" as a "label" (with the 63 char limit on lables) does indeed seem wrong.

http://www.ietf.org/rfc/rfc3492.txt says that puny-encoding is done on "labels", and section 3.5 of http://www.ietf.org/rfc/rfc1034.txt doesn't seem to make allowances for "@" either in labels or as a label separator. So it's not even clear to me from that, or from the fact that puny-encoding is intended for _domains_, that "puny-encoding an email address" is a valid thing to do. If it is, it's not clear to me that the username part should be encoded.

Then again it's almost 3am and I probably shouldn't be trying to make sense of unfamiliar RFCs, so I could have got this all wrong.

Besides all this, it could be argued that our PunycodeEncodeEmailAddress should just pass the entire string to ConvertUTF8toACE if it doesn't contain "@", which would fix this bug.
> it's not even clear to me from that, or from the fact that puny-encoding is intended for
> _domains_, that "puny-encoding an email address" is a valid thing to do.

That might be good feedback for https://www.w3.org/Bugs/Public/show_bug.cgi?id=26812 fwiw.
Done. (The only reason I hadn't done so is because I thought http://www.w3.org/accounts/recover was supposed to also reset my w3.org bugzilla password and it was just taking time to propagate - turns out it's not that linked up and it's a separate password.)
Assignee: nobody → jwatt
We should at least do this given the spec changes Hixie made in response to the W3C bug. Looking into the validity parts still.
Attachment #8555886 - Flags: review?(bzbarsky)
I had to change some tests because it seems the encoding of the username part has been causing us to fail to report errors where the username part contains characters that shouldn't be allowed per:

https://html.spec.whatwg.org/multipage/forms.html#valid-e-mail-address
http://tools.ietf.org/html/rfc5322#section-3.2.3
Attachment #8556045 - Flags: review?(bzbarsky)
Right now the tests only check for type mismatch. This updates them to check for bad input too.
Attachment #8556048 - Flags: review?(bugs)
Attachment #8556048 - Attachment is patch: true
Comment on attachment 8555886 [details] [diff] [review]
part 1 - only encode the domain lables

s/lables/labels/ throughout, please (both in commit message and code comments).

r=me
Attachment #8555886 - Flags: review?(bzbarsky) → review+
Comment on attachment 8556045 [details] [diff] [review]
don't set the 'badInput' state on <input type=email> for punycode encoding failures

The meaning of aIndexOfAt in the case when the string has no '@' needs documentation.

Why do we want to treat the whole string as a domain name if there is no '@' instead of treating it all as a username?

>+    if (!HasAttr(kNameSpaceID_None, nsGkAtoms::multiple)) {

I'm not following this bit (and in particular, it doesn't seem to match the checkin comment)....
Attachment #8556045 - Flags: review?(bzbarsky) → review-
(In reply to Boris Zbarsky [:bz] from comment #17)
> Why do we want to treat the whole string as a domain name if there is no '@'
> instead of treating it all as a username?

No good reason. I guess we would be better to treat it as a the username to give priority to the "not an email address" error, rather than essentially reporting "email address' domain can't be punycode encoded".

> >+    if (!HasAttr(kNameSpaceID_None, nsGkAtoms::multiple)) {
> 
> I'm not following this bit (and in particular, it doesn't seem to match the
> checkin comment)....

Hmm, yeah I guess this is a case of wanting the "not an email address" error to take precedence. Reverted.
Attachment #8556045 - Attachment is obsolete: true
Attachment #8556092 - Flags: review?(bzbarsky)
Comment on attachment 8556092 [details] [diff] [review]
part 2 - don't set the 'badInput' state on <input type=email> for punycode encoding failures

OK.  So need to document that if PunycodeEncodeEmailAddress returned false then aEncodedEmail should not be examined, right?

>-  [ 'this.is.email.should.be.longer.than.sixty.four.characters.föö@mözillä.tld', true ],

Could you add a test with a long username containing only ASCII that tests that we consider it valid?

r=me with those nits fixed.
Attachment #8556092 - Flags: review?(bzbarsky) → review+
Sure thing.
Attachment #8556048 - Flags: review?(bugs) → review+
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: