Closed Bug 656009 Opened 13 years ago Closed 13 years ago

Email address validation doesn't match HTML5 spec - presence of \

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: gerv, Unassigned)

Details

tl;dr: - Usernames can include backslashes in our code, but not in HTML5.

http://mxr.mozilla.org/mozilla-central/source/content/html/content/src/nsHTMLInputElement.cpp#4018

vs.

http://www.whatwg.org/specs/web-apps/current-work/multipage/states-of-the-type-attribute.html#e-mail-state

HTML5 says:

"1*( atext / "." ) "@" ldh-str *( "." ldh-str ) where atext is defined in RFC
5322 section 3.2.3, and ldh-str is defined in RFC 1034 section 3.5."

Mining those RFCs:

   atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                       "!" / "#" /        ;  characters not including
                       "$" / "%" /        ;  specials.  Used for atoms.
                       "&" / "'" /
                       "*" / "+" /
                       "-" / "/" /
                       "=" / "?" /
                       "^" / "_" /
                       "`" / "{" /
                       "|" / "}" /
                       "~"

<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

<let-dig-hyp> ::= <let-dig> | "-"

<let-dig> ::= <letter> | <digit>

<letter> ::= any one of the 52 alphabetic characters A through Z in
upper case and a through z in lower case

<digit> ::= any one of the ten digits 0 through 9


That leads to the following, if I'm not mistaken:

/^[a-z0-9.!#$%&'*+\-/=?\^_`{|}~]+@[a-z0-9-]+(\.[a-z0-9-]+)*$/i

which can be reduced to:

/^[\w.!#$%&'*+\-/=?\^`{|}~]+@[a-z0-9-]+(\.[a-z0-9-]+)*$/i


I note a discrepancy: usernames can include backslashes in our code, but not in HTML5

Also, do we really not have access to a regex engine in this code? Surely that would be more efficient, even if it involved a call to JS?

Perhaps it's a different issue, but this is not very forwardly-compatible - isn't someone working on a spec for allowing unicode characters in the local part?

Gerv
Actually, there is no '\' allowed. What you thought was a '\' is actually a '\''. I had to escape the '.

You can easily test that with this data URL:
data:text/html,<style>:invalid { box-shadow: 0 0 1.5px 1px red; }</style><input type='email' value='foo\bar@mail.com'>

And I do not think that we need to use a regexp for the email address. We are using a regexp for the pattern attribute (through the JS engine) so it's technically doable but doesn't seem useful given that this code is readable right now. For IDN, the specs require to use punycode so the validation isn't going to change AFAIUI.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → INVALID
Component: Layout: Form Controls → DOM: Core & HTML
QA Contact: layout.form-controls → general
Version: unspecified → Trunk
You are quite right - fair enough :-) Thanks.

My concern about using a regexp engine is that this code performs 2 function calls and 20 comparisons per character for the local part, and 2 function calls per character for the domain part, and the new definitions of moz-is-valid and so on do, in some cases, cause for validity to be checked after each change to the input field. Perhaps it's still so tiny as to make no difference, I don't know. But it seems inefficient to me :-)

Gerv
I don't have any data on that but I would bet that any regexp will be slower than this code given that it is really specific to what we want.
I'm no expert on regexps but I know the regexp engine compiles them (presumably once, if you write it right) and then they are very fast. For a straightforward one like this, I'm sure it would better than 20 comparisons and 2 function calls per character!

We could either ask Brendan or a JS person, or we could decide it's not performance-critical anyway. :-)

Gerv
You need to log in before you can comment on or make changes to this bug.