Last Comment Bug 555559 - Implement <input type="email">
: Implement <input type="email">
Status: RESOLVED FIXED
[parity-opera]
: dev-doc-complete, html5
Product: Core
Classification: Components
Component: DOM: Core & HTML (show other bugs)
: Trunk
: All All
: -- enhancement with 2 votes (vote)
: mozilla2.0b5
Assigned To: Mounir Lamouri (:mounir)
:
Mentors:
http://dev.w3.org/html5/spec/forms.ht...
: 573489 (view as bug list)
Depends on: 345624 345822 581652 618876
Blocks: html5forms 562625 556817 558370 558651 558788 559309 559759 561635 566348 581596 583586
  Show dependency treegraph
 
Reported: 2010-03-28 10:02 PDT by Mounir Lamouri (:mounir)
Modified: 2011-11-27 18:16 PST (History)
22 users (show)
mounir: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Tests v1 (8.64 KB, patch)
2010-04-13 17:07 PDT, Mounir Lamouri (:mounir)
no flags Details | Diff | Splinter Review
Patch v1 (20.66 KB, patch)
2010-04-13 17:09 PDT, Mounir Lamouri (:mounir)
bugs: review-
Details | Diff | Splinter Review
Tests v1.1 (8.77 KB, patch)
2010-04-13 17:24 PDT, Mounir Lamouri (:mounir)
bugs: review-
Details | Diff | Splinter Review
Tests v2 (9.96 KB, patch)
2010-04-14 15:57 PDT, Mounir Lamouri (:mounir)
bugs: review-
Details | Diff | Splinter Review
Patch v2 (25.35 KB, patch)
2010-04-14 16:06 PDT, Mounir Lamouri (:mounir)
no flags Details | Diff | Splinter Review
Patch v2.1 (22.35 KB, patch)
2010-04-15 02:07 PDT, Mounir Lamouri (:mounir)
bugs: review-
Details | Diff | Splinter Review
Tests v3 (11.07 KB, patch)
2010-04-19 08:51 PDT, Mounir Lamouri (:mounir)
no flags Details | Diff | Splinter Review
Patch v3 (24.22 KB, patch)
2010-04-19 08:55 PDT, Mounir Lamouri (:mounir)
no flags Details | Diff | Splinter Review
Tests v3.1 (12.99 KB, patch)
2010-04-20 05:03 PDT, Mounir Lamouri (:mounir)
bugs: review+
Details | Diff | Splinter Review
Patch v3.1 (26.12 KB, patch)
2010-04-20 05:04 PDT, Mounir Lamouri (:mounir)
bugs: review-
Details | Diff | Splinter Review
Tests v3.2 (13.36 KB, patch)
2010-04-25 04:43 PDT, Mounir Lamouri (:mounir)
no flags Details | Diff | Splinter Review
Tests v3.3 (13.44 KB, patch)
2010-04-26 09:11 PDT, Mounir Lamouri (:mounir)
no flags Details | Diff | Splinter Review
Patch v4 (27.60 KB, patch)
2010-04-26 09:13 PDT, Mounir Lamouri (:mounir)
bugs: review+
Details | Diff | Splinter Review
Patch v4.1 (27.70 KB, patch)
2010-05-04 02:48 PDT, Mounir Lamouri (:mounir)
no flags Details | Diff | Splinter Review
Patch v5 (25.85 KB, patch)
2010-05-11 09:04 PDT, Mounir Lamouri (:mounir)
no flags Details | Diff | Splinter Review
Tests v3.4 (11.59 KB, patch)
2010-05-26 18:59 PDT, Mounir Lamouri (:mounir)
no flags Details | Diff | Splinter Review
Patch v6 (24.47 KB, patch)
2010-05-26 19:00 PDT, Mounir Lamouri (:mounir)
jonas: review+
Details | Diff | Splinter Review
Patch v6.1 (24.59 KB, patch)
2010-05-27 04:27 PDT, Mounir Lamouri (:mounir)
jst: superreview-
Details | Diff | Splinter Review
Patch v6.2 (45.31 KB, patch)
2010-06-07 06:06 PDT, Mounir Lamouri (:mounir)
no flags Details | Diff | Splinter Review
Patch v6.3 (42.07 KB, patch)
2010-06-07 06:58 PDT, Mounir Lamouri (:mounir)
jst: superreview+
Details | Diff | Splinter Review
Patch v6.4 (37.06 KB, patch)
2010-07-28 06:37 PDT, Mounir Lamouri (:mounir)
no flags Details | Diff | Splinter Review
Patch v6.5 (37.09 KB, patch)
2010-08-01 08:40 PDT, Mounir Lamouri (:mounir)
no flags Details | Diff | Splinter Review

Description Mounir Lamouri (:mounir) 2010-03-28 10:02:19 PDT
HTML5 Forms introduces a new state for input elements. The email state basically lets the user put an email then it can be validated with the constraint validation api (bug 345624).
Other things could be done with follow-up bugs like implementation of multiple attribute.
Comment 1 Ed Lee :Mardak 2010-04-01 19:14:35 PDT
Dolske: Any suggestions for adding in searching of email contacts?

Currently, nsFormFillController chooses between searching with the login manager or satchel.
http://hg.mozilla.org/mozilla-central/annotate/fb6fa1f88790/toolkit/components/satchel/src/nsFormFillController.cpp#l496

The nsIDOMHTMLInputElement is available here as mFocusedInput, so it could check the type and dynamically dispatch to the appropriate nsIFormAutoComplete.autoCompleteSearch. This would need some way to allow nsIFormAutoCompletes to register which types they support. I suppose nsILoginManager wouldn't need to duplicate the interface and nsLoginManager could register itself as type=password.. maybe.

Alternatively, satchel could just handle all of these as it does now, and it also gets the html input, so it can decide what to do with the special type such as finding results that match the input's name OR the input's type.

However, looking at Contacts from Labs, people/email probably will live in a separate database/table as it's more complex than just previous-value retrieval. So not overloading satchel might be better..
Comment 2 Mounir Lamouri (:mounir) 2010-04-13 17:07:32 PDT
Created attachment 438894 [details] [diff] [review]
Tests v1
Comment 3 Mounir Lamouri (:mounir) 2010-04-13 17:09:45 PDT
Created attachment 438895 [details] [diff] [review]
Patch v1

This bug follows bug 456229 and bug 558594 so, Olli, I think you are the best person to review it.
Comment 4 Mounir Lamouri (:mounir) 2010-04-13 17:20:16 PDT
The input element in the email state uses the multiple attributes. I think that could be done in a follow-up.
Comment 5 Mounir Lamouri (:mounir) 2010-04-13 17:24:46 PDT
Created attachment 438902 [details] [diff] [review]
Tests v1.1

I've added tests to check the email is invalid when surrounded by spaces.
Comment 6 Olli Pettay [:smaug] 2010-04-14 04:47:20 PDT
Could you implement support for 'multiple' in this bug?
The patch isn't too large yet.
Comment 7 Olli Pettay [:smaug] 2010-04-14 04:48:01 PDT
Comment on attachment 438902 [details] [diff] [review]
Tests v1.1

I hoping to get tests for "multiple"
Comment 8 Olli Pettay [:smaug] 2010-04-14 04:48:25 PDT
Comment on attachment 438895 [details] [diff] [review]
Patch v1

Same here.
Comment 9 Mounir Lamouri (:mounir) 2010-04-14 15:57:59 PDT
Created attachment 439119 [details] [diff] [review]
Tests v2
Comment 10 Mounir Lamouri (:mounir) 2010-04-14 16:06:58 PDT
Created attachment 439123 [details] [diff] [review]
Patch v2

I thought I will have to do a dynamic check with the multiple attribute but as long as we check validity only with js calls, I think we can parse the value each time.
I think implementing :valid and :invalid pseudo-element will need some changes but as it will surely not be the only change, let's keep a trivial implementation until :valid implementation.

I wrote my own function to split the email list because I think calling some helpers would be a loss in performance. At the moment, the string is read twice at worst. It could be better (read once) if I wasn't using |IsValidEmailAddress()|. If you think performance is less important that readability in this situation, I can change that.

For the function validating a simple email address I think it's quite easy to understand it and it's probably much more better than a regexp.
Comment 11 Ed Lee :Mardak 2010-04-14 18:54:07 PDT
(In reply to comment #10)
> Created an attachment (id=439123) [details]
Seems like you attached a patch for type=search.

> For the function validating a simple email address I think it's quite easy to
> understand it and it's probably much more better than a regexp.
Curious, the validating will only set those :valid pseudo-elements and not prevent other inputs, yes? For example, Édwärd is an invalid email (both before and after the @), but that wouldn't prevent me from typing it in? This is important for a smarter formfill that can match by any string to suggest an actual email address.
Comment 12 Mounir Lamouri (:mounir) 2010-04-15 02:07:31 PDT
Created attachment 439195 [details] [diff] [review]
Patch v2.1

Indeed, it was the input type search patch.
Thank you for warning me Edward :)
Comment 13 Mounir Lamouri (:mounir) 2010-04-15 02:13:16 PDT
(In reply to comment #11)
> (In reply to comment #10)
> > For the function validating a simple email address I think it's quite easy to
> > understand it and it's probably much more better than a regexp.
> Curious, the validating will only set those :valid pseudo-elements and not
> prevent other inputs, yes? For example, Édwärd is an invalid email (both before
> and after the @), but that wouldn't prevent me from typing it in? This is
> important for a smarter formfill that can match by any string to suggest an
> actual email address.

Right, you can type whatever you want even if it's recognized as invalid. But the CSS pseudo-element will have to be quite real-time AFAIK so if you type "foo@foo." is will not be valid but when "foo@foo.c" will be typed, the form field should be recognized as valid. I don't think that would be a good idea to parse the entire string each time a user do an input in the field.
Anyway, I've not read this specification at the moment...
Comment 14 Olli Pettay [:smaug] 2010-04-18 14:58:31 PDT
Comment on attachment 439119 [details] [diff] [review]
Tests v2

in for loops add space before and after <

You don't seem to test other white space characters than " "
Comment 15 Olli Pettay [:smaug] 2010-04-18 15:09:48 PDT
Comment on attachment 439195 [details] [diff] [review]
Patch v2.1

+nsHTMLInputElement::IsValidEmailAddressList(const nsAString& aValue)
doesn't handle all the white space characters.

There is a typo 'Trairing'.


Nit,
    }
    else if (c == ' ') {
->
    } else if (c == ' ') {



+    endEmail = length-1;
+  }
Space before and after -


I think adding set-of-comma-separated-tokens algorithm to
a method of nsContentUtils might be useful. It could take a
pointer to some function, which would be called for
each token.
Though, I wonder if we already have something like it somewhere. nsCRT::strtok(..., ", \t", ...) is quite close.
Comment 16 Mounir Lamouri (:mounir) 2010-04-18 17:30:39 PDT
So, HTML5 space characters are: ' ', '\n', '\r', '\t' and '\f'. I'm wondering why '\f' (first time I heard about this one) ? I'm also wondering why nsHTMLInputElement.cpp sets: |kWhitespace[] = "\n\r\t\b";|. Why is '\b' considered as a white space character ?

In addition, in my opinion, |email.value='f\roo@bar.com'| is a valid email value because '\r' should be stripped (like '\n') but at the moment, return values are not stripped but changed in white spaces. I'm wondering if I'm misinterpreting the specifications or if there is a bug here ? Webkit behavior is like our current behavior but Opera's is what I think may be the right one.

Indeed, we could add a set-of-comma-separated-tokens algorithm using strtok.
Comment 17 Olli Pettay [:smaug] 2010-04-19 00:56:42 PDT
What I'm mainly concerned (in practice) is \t.

\b is there because of some historic reason. HTML5 may very well miss it, or
perhaps the reasons have changed. Bug 114997 seems to have just copied those
whitespace values from parser.
Comment 18 Mounir Lamouri (:mounir) 2010-04-19 08:51:11 PDT
Created attachment 439938 [details] [diff] [review]
Tests v3

I've added some tests in 'todo' because they need to have '\r' and '\n' stripped.
Comment 19 Mounir Lamouri (:mounir) 2010-04-19 08:55:51 PDT
Created attachment 439939 [details] [diff] [review]
Patch v3

I did not add a function checking for the validity of a set of token separated by commas because email list are the only one needing to be checked however, I've added a function returning the next token in a token list separated by commas.
Comment 20 Mounir Lamouri (:mounir) 2010-04-20 05:03:45 PDT
Created attachment 440188 [details] [diff] [review]
Tests v3.1

Now adding tests to the 'required' attribute tests as the patch has now to be applied on top of the required patch.
Comment 21 Mounir Lamouri (:mounir) 2010-04-20 05:04:53 PDT
Created attachment 440190 [details] [diff] [review]
Patch v3.1

Updated patch now using the required patch (see bug 345822).
Comment 22 Olli Pettay [:smaug] 2010-04-21 10:32:17 PDT
Comment on attachment 440188 [details] [diff] [review]
Tests v3.1

Few nits

>+++ b/content/html/content/test/test_bug555559.html	Tue Apr 20 12:43:49 2010 +0200
...
>+function todoCheckValidEmailAddress(element)
>+{
>+  todo(!element.validity.typeMismatch,
>+    "Element should not suffer from type mismatch");
Align "Element under !element
>+  todo(element.validity.valid, "Element should be valid");
>+  todo(element.checkValidity(), "Element should be valid");
>+  todo_is(element.validationMessage, '',
>+    "Validation message should be the empty string");
similar thing here

>+function checkValidEmailAddress(element)
>+{
>+  ok(!element.validity.typeMismatch,
>+    "Element should not suffer from type mismatch");
And here

>+  ok(element.validity.valid, "Element should be valid");
>+  ok(element.checkValidity(), "Element should be valid");
>+  is(element.validationMessage, '',
>+    "Validation message should be the empty string");
And here

>+for (var i=0; i<legalCharacters.length; i++)
Space before and after <, and { should be in the same
line as 'for'

>+for (var i=0; i<illegalCharacters.length; i++)
>+{
ditto

>+for (var i=0; i<illegalCharacters.length; i++)
>+{
and here


Could you add some tests for invalid event. 
I know test_bug345624-1.html has some tests, but would be great to make
sure that the event works properly also with "email"
Comment 23 Mounir Lamouri (:mounir) 2010-04-25 04:43:58 PDT
Created attachment 441350 [details] [diff] [review]
Tests v3.2

r=smaug
Comment 24 Olli Pettay [:smaug] 2010-04-26 07:11:46 PDT
Comment on attachment 440190 [details] [diff] [review]
Patch v3.1

I assume there will be a new less-buggy patch coming ;)
And please add more testcases for parsing.
Comment 25 Mounir Lamouri (:mounir) 2010-04-26 09:11:34 PDT
Created attachment 441505 [details] [diff] [review]
Tests v3.3

r=smaug

I've added one test to check the case when multiple commas are in the middle of a list. We can't do much more test related to token list in this patch because the only way we can test is checking the validity, not the different tokens.
Comment 26 Mounir Lamouri (:mounir) 2010-04-26 09:13:02 PDT
Created attachment 441507 [details] [diff] [review]
Patch v4

I've cleaned up |IsValidEmailAddressList| method.
Comment 27 Olli Pettay [:smaug] 2010-05-03 09:27:05 PDT
Comment on attachment 441507 [details] [diff] [review]
Patch v4


>+PRBool nsContentUtils::GetNextTokenFromCommaSeparatedList(nsAString::const_iterator& beginString,
>+                                                          const nsAString::const_iterator& endString,
>+                                                          nsAString::const_iterator& beginToken,
>+                                                          nsAString::const_iterator& endToken)
>+{
>+  // We are using |beginString| as an iterator. It prevents us to re-setting
>+  // |beginString| to |iter| at the end, especially to forget doing that ;).
>+
>+  while (beginString != endString && IsHTMLWhitespace(*beginString)) {
>+    // Igoring leading white spaces.
>+    ++beginString;
>+  }
>+
>+  beginToken = beginString;
>+
>+  while (beginString != endString && !IsHTMLWhitespace(*beginString) && *beginString != ',') {
>+    // The token ends when a white spaces or a comma is found.
>+    ++beginString;
>+  }
>+
>+  endToken = beginString;
>+
>+  while (beginString != endString && *beginString != ',') {
>+    // Ignoring trailing white spaces.
>+    ++beginString;
>+  }
>+
>+  if (*beginString == ',') {
>+    // If we are ending because of a comma, we don't want it to be in the
>+    // returned list.
>+    ++beginString;
>+
>+    // If we are ending with a comma, there must be a token following.
>+    return PR_TRUE;
>+  }
>+
>+  // Next time, we want to parse the list from here.
>+  beginString = beginString;
This looks wrong. And probably means that there is some testcase missing.


If you check/fix that, and hopefully even find a testecase for it, r=me

I think a second review would be good for this because it is easy to make
small mistakes in parsing.
Comment 28 Mounir Lamouri (:mounir) 2010-05-04 02:48:22 PDT
Created attachment 443314 [details] [diff] [review]
Patch v4.1

Actually, the reason of this is explained at the beginning of the function:
>  // We are using |beginString| as an iterator. It prevents us to re-setting
>  // |beginString| to |iter| at the end, especially to forget doing that ;).

So, I've done a s/iter/beginString/ then, the final |beginString = iter;| became |beginString = beginString;| and I didn't notice it. This is a stupid mistake but harmless as it is not changing the behavior.
Comment 29 Olli Pettay [:smaug] 2010-05-04 05:37:53 PDT
I meant just one another review, not review + sr ;)
Comment 30 Jonas Sicking (:sicking) PTO Until July 5th 2010-05-10 15:07:01 PDT
Comment on attachment 443314 [details] [diff] [review]
Patch v4.1

Olli: You meant for you to do that other review, right? Not for another person to do it.

If you do want me to review please assign the r? back to me.
Comment 31 Olli Pettay [:smaug] 2010-05-10 15:11:23 PDT
Comment on attachment 443314 [details] [diff] [review]
Patch v4.1

I meant another eyes :)
Comment 32 Jonas Sicking (:sicking) PTO Until July 5th 2010-05-10 18:09:16 PDT
Could you use nsCharSeparatedTokenizer instead?
Comment 33 Olli Pettay [:smaug] 2010-05-11 02:10:57 PDT
As I wondered in comment 15, we seem to have something for tokenization after all :) And nsCharSeparatedTokenizer has rather nice API.
Comment 34 Mounir Lamouri (:mounir) 2010-05-11 02:37:22 PDT
Indeed, that sounds good. I will just have to add a flag to use |IsHTMLWhitespace| instead of the list here (which could be |NS_IsAsciiWhitespace| or |IsAsciiSpace|). By the way, do you have any idea why there is a check for |aChar <= ' '| in |isWhitespace| ? Is that intend to be an optimization ?
> PRBool isWhitespace(PRUnichar aChar)
> {
>   return aChar <= ' ' &&
>          (aChar == ' ' || aChar == '\n' ||
>           aChar == '\r'|| aChar == '\t');
> }
Comment 35 Mounir Lamouri (:mounir) 2010-05-11 09:04:43 PDT
Created attachment 444677 [details] [diff] [review]
Patch v5

r=smaug

This patch is using |nsCharSeparatedTokenizer| instead of creating a method in 
|nsContentUtils|.
Comment 36 Jonas Sicking (:sicking) PTO Until July 5th 2010-05-25 15:06:39 PDT
Comment on attachment 444677 [details] [diff] [review]
Patch v5

>+nsHTMLInputElement::HasTypeMismatch()
>+{
>+  if (mType == NS_FORM_INPUT_EMAIL) {
>+    nsAutoString value;
>+    NS_ENSURE_SUCCESS(GetValue(value), PR_FALSE);
>+
>+    if (value.IsEmpty()) {
>+      return PR_FALSE;
>+    }
>+
>+    if (HasAttr(kNameSpaceID_None, nsGkAtoms::multiple)) {
>+      return !IsValidEmailAddressList(value);
>+    }
>+
>+    return !IsValidEmailAddress(value);

return HasAttr(kNameSpaceID_None, nsGkAtoms::multiple) ?
       !IsValidEmailAddressList(value) :
       !IsValidEmailAddress(value)

Or, IMHO even better, inline the IsValidEmailAddressList implementation into this function. Seems unnecessary to have a separate function for now. If we end up needing in it in other places then we can always break it out into a separate function then (which likely would live in another class).


>@@ -3422,23 +3478,98 @@ nsHTMLInputElement::GetValidationMessage
>         default:
>           key.Assign("TextElementSuffersFromBeingMissing");
>       }
>       rv = nsContentUtils::GetLocalizedString(nsContentUtils::eDOM_PROPERTIES,
>                                               key.get(), message);
>       aValidationMessage = message;
>       break;
>     }
>+    case VALIDATION_MESSAGE_TYPE_MISMATCH:
>+    {
>+      nsXPIDLString message;
>+      rv = nsContentUtils::GetLocalizedString(nsContentUtils::eDOM_PROPERTIES,
>+                                              "ElementSuffersFromInvalidEmail",
>+                                              message);
>+      aValidationMessage = message;
>+      break;
>+    }

At least assert that mType == NS_FORM_INPUT_EMAIL

>+//static
>+PRBool
>+nsHTMLInputElement::IsValidEmailAddressList(const nsAString& aValue)
>+{
>+  nsCharSeparatedTokenizer tokenizer(aValue, ',', nsCharSeparatedTokenizer::HTML_WHITESPACE);
>+
>+  while (tokenizer.hasMoreTokens()) {
>+    if (!IsValidEmailAddress(tokenizer.nextToken())) {
>+      return PR_FALSE;
>+    }
>+  }
>+
>+  return !tokenizer.lastTokenEndedWithSeparator();
>+}

Have you checked that this does the right thing with regards to values like:

",foo@bar.com,   , foo@bar.com  ,  "

I.e. with regards to empty entries in the beginning, middle and end. And with regards to whitespace trimming. Hint, you might need to use lastTokenEndedWithSeparator()

>+PRBool
>+nsHTMLInputElement::IsValidEmailAddress(const nsAString& aValue)
>+{
>+  PRBool parsingDomain = PR_TRUE;
>+  PRUint32 dotIdx = 0;
>+  PRUint32 length = aValue.Length();
>+
>+  for (PRUint32 i=0; i<length; i++) {
>+    PRUnichar c = aValue[length - i - 1];
>+
>+    if (parsingDomain) {
>+      if (c == '@') {
>+        // The domain part has to contain a dot (.).
>+        if (dotIdx == 0) {
>+          return PR_FALSE;
>+        }
>+        // The dot (.) can't be the first domain character.
>+        if (dotIdx == i - 1) {
>+          return PR_FALSE;
>+        }
>+        // The '@' is the first character, there is no username.
>+        if (i == length - 1) {
>+          return PR_FALSE;
>+        }
>+        parsingDomain = PR_FALSE;
>+      } else if (c == '.') {
>+        // The dot (.) can't be at the end of the address.
>+        if (i == 0) {
>+          return PR_FALSE;
>+        }
>+        dotIdx = i;
>+      } else if (!(nsCRT::IsAsciiAlpha(c) || nsCRT::IsAsciiDigit(c) ||
>+                   c == '-')) {
>+        return PR_FALSE;
>+      }
>+    } else { // Parsing the username part.
>+      if (!(nsCRT::IsAsciiAlpha(c) || nsCRT::IsAsciiDigit(c) ||
>+            c == '.' || c == '!' || c == '#' || c == '$' || c == '%' ||
>+            c == '&' || c == '\''|| c == '*' || c == '+' || c == '-' ||
>+            c == '/' || c == '=' || c == '?' || c == '^' || c == '_' ||
>+            c == '`' || c == '{' || c == '|' || c == '}' || c == '~')) {
>+        return PR_FALSE;
>+      }
>+    }
>+  }

Personally I think this would be more readable as two for-loops, rather than this state-machine approach. Up to you.

>diff -r ae44e4c86c77 xpcom/ds/nsCharSeparatedTokenizer.h
>@@ -151,19 +153,21 @@ public:
> private:
>     nsSubstring::const_char_iterator mIter, mEnd;
>     PRPackedBool mLastTokenEndedWithSeparator;
>     PRUnichar mSeparatorChar;
>     PRUint32  mFlags;
> 
>     PRBool isWhitespace(PRUnichar aChar)
>     {
>-        return aChar <= ' ' &&
>-               (aChar == ' ' || aChar == '\n' ||
>-                aChar == '\r'|| aChar == '\t');
>+      if (mFlags & HTML_WHITESPACE) {
>+        return nsContentUtils::IsHTMLWhitespace(aChar);
>+      }
>+
>+      return nsCRT::IsAsciiSpace(aChar);

return mFlags & HTML_WHITESPACE ?
       nsContentUtils::IsHTMLWhitespace(aChar) :
       nsCRT::IsAsciiSpace(aChar)

looks good otherwise. Would like to know that the nsCharSeparatedTokenizer does the right thing when it comes to empty tokens before putting my r=me on the patch though, as I suspect it currently does not.
Comment 37 Jonas Sicking (:sicking) PTO Until July 5th 2010-05-26 11:37:09 PDT
Comment on attachment 444677 [details] [diff] [review]
Patch v5

Unsetting request. Rerequest if the current patch is indeed correct.
Comment 38 Mounir Lamouri (:mounir) 2010-05-26 17:17:06 PDT
(In reply to comment #36)
> (From update of attachment 444677 [details] [diff] [review])
> >+nsHTMLInputElement::HasTypeMismatch()
> >+{
> >+  if (mType == NS_FORM_INPUT_EMAIL) {
> >+    nsAutoString value;
> >+    NS_ENSURE_SUCCESS(GetValue(value), PR_FALSE);
> >+
> >+    if (value.IsEmpty()) {
> >+      return PR_FALSE;
> >+    }
> >+
> >+    if (HasAttr(kNameSpaceID_None, nsGkAtoms::multiple)) {
> >+      return !IsValidEmailAddressList(value);
> >+    }
> >+
> >+    return !IsValidEmailAddress(value);
> 
> return HasAttr(kNameSpaceID_None, nsGkAtoms::multiple) ?
>        !IsValidEmailAddressList(value) :
>        !IsValidEmailAddress(value)
> 
> Or, IMHO even better, inline the IsValidEmailAddressList implementation into
> this function. Seems unnecessary to have a separate function for now. If we end
> up needing in it in other places then we can always break it out into a
> separate function then (which likely would live in another class).

I will probably need IsValidEmailAddressList for :vaild and :invalid pseudo-classes. I prefer to keep the function in this patch and I will see with the pseudo-classes patch what I do with them.

> >@@ -3422,23 +3478,98 @@ nsHTMLInputElement::GetValidationMessage
> >         default:
> >           key.Assign("TextElementSuffersFromBeingMissing");
> >       }
> >       rv = nsContentUtils::GetLocalizedString(nsContentUtils::eDOM_PROPERTIES,
> >                                               key.get(), message);
> >       aValidationMessage = message;
> >       break;
> >     }
> >+    case VALIDATION_MESSAGE_TYPE_MISMATCH:
> >+    {
> >+      nsXPIDLString message;
> >+      rv = nsContentUtils::GetLocalizedString(nsContentUtils::eDOM_PROPERTIES,
> >+                                              "ElementSuffersFromInvalidEmail",
> >+                                              message);
> >+      aValidationMessage = message;
> >+      break;
> >+    }
> 
> At least assert that mType == NS_FORM_INPUT_EMAIL

input type='url' can suffer from type mismatch too and the input url type patch has been r/sr so I prefer to not add an assert here because that would be wrong. I can open a follow-up to assert if an element suffers from something it should not.

> >+//static
> >+PRBool
> >+nsHTMLInputElement::IsValidEmailAddressList(const nsAString& aValue)
> >+{
> >+  nsCharSeparatedTokenizer tokenizer(aValue, ',', nsCharSeparatedTokenizer::HTML_WHITESPACE);
> >+
> >+  while (tokenizer.hasMoreTokens()) {
> >+    if (!IsValidEmailAddress(tokenizer.nextToken())) {
> >+      return PR_FALSE;
> >+    }
> >+  }
> >+
> >+  return !tokenizer.lastTokenEndedWithSeparator();
> >+}
> 
> Have you checked that this does the right thing with regards to values like:
> 
> ",foo@bar.com,   , foo@bar.com  ,  "
> 
> I.e. with regards to empty entries in the beginning, middle and end. And with
> regards to whitespace trimming. Hint, you might need to use
> lastTokenEndedWithSeparator()

I think the example you've proposed is tested. Actually, the empty token should be managed correctly:
- if the token is empty at the beginning or at the middle of the string, |nextToken| will return an empty string which will not a valid email address ;
- if the token is empty at the end of the string, as I call |return !tokenizer.lastTokenEndedWithSeparator();|, I think the list will be invalid.

> >+PRBool
> >+nsHTMLInputElement::IsValidEmailAddress(const nsAString& aValue)
> >+{
> >+  PRBool parsingDomain = PR_TRUE;
> >+  PRUint32 dotIdx = 0;
> >+  PRUint32 length = aValue.Length();
> >+
> >+  for (PRUint32 i=0; i<length; i++) {
> >+    PRUnichar c = aValue[length - i - 1];
> >+
> >+    if (parsingDomain) {
> >+      if (c == '@') {
> >+        // The domain part has to contain a dot (.).
> >+        if (dotIdx == 0) {
> >+          return PR_FALSE;
> >+        }
> >+        // The dot (.) can't be the first domain character.
> >+        if (dotIdx == i - 1) {
> >+          return PR_FALSE;
> >+        }
> >+        // The '@' is the first character, there is no username.
> >+        if (i == length - 1) {
> >+          return PR_FALSE;
> >+        }
> >+        parsingDomain = PR_FALSE;
> >+      } else if (c == '.') {
> >+        // The dot (.) can't be at the end of the address.
> >+        if (i == 0) {
> >+          return PR_FALSE;
> >+        }
> >+        dotIdx = i;
> >+      } else if (!(nsCRT::IsAsciiAlpha(c) || nsCRT::IsAsciiDigit(c) ||
> >+                   c == '-')) {
> >+        return PR_FALSE;
> >+      }
> >+    } else { // Parsing the username part.
> >+      if (!(nsCRT::IsAsciiAlpha(c) || nsCRT::IsAsciiDigit(c) ||
> >+            c == '.' || c == '!' || c == '#' || c == '$' || c == '%' ||
> >+            c == '&' || c == '\''|| c == '*' || c == '+' || c == '-' ||
> >+            c == '/' || c == '=' || c == '?' || c == '^' || c == '_' ||
> >+            c == '`' || c == '{' || c == '|' || c == '}' || c == '~')) {
> >+        return PR_FALSE;
> >+      }
> >+    }
> >+  }
> 
> Personally I think this would be more readable as two for-loops, rather than
> this state-machine approach. Up to you.

You're right. States machines are clearly less readable.

> >diff -r ae44e4c86c77 xpcom/ds/nsCharSeparatedTokenizer.h
> >@@ -151,19 +153,21 @@ public:
> > private:
> >     nsSubstring::const_char_iterator mIter, mEnd;
> >     PRPackedBool mLastTokenEndedWithSeparator;
> >     PRUnichar mSeparatorChar;
> >     PRUint32  mFlags;
> > 
> >     PRBool isWhitespace(PRUnichar aChar)
> >     {
> >-        return aChar <= ' ' &&
> >-               (aChar == ' ' || aChar == '\n' ||
> >-                aChar == '\r'|| aChar == '\t');
> >+      if (mFlags & HTML_WHITESPACE) {
> >+        return nsContentUtils::IsHTMLWhitespace(aChar);
> >+      }
> >+
> >+      return nsCRT::IsAsciiSpace(aChar);
> 
> return mFlags & HTML_WHITESPACE ?
>        nsContentUtils::IsHTMLWhitespace(aChar) :
>        nsCRT::IsAsciiSpace(aChar)

Ok.

> looks good otherwise. Would like to know that the nsCharSeparatedTokenizer does
> the right thing when it comes to empty tokens before putting my r=me on the
> patch though, as I suspect it currently does not.

For me nsCharSeparatedTokenizer is correctly used. Please, correct me if I'm wrong.

I'm going to update the patch for the other issues.
Comment 39 Jonas Sicking (:sicking) PTO Until July 5th 2010-05-26 17:30:51 PDT
> > At least assert that mType == NS_FORM_INPUT_EMAIL
> 
> input type='url' can suffer from type mismatch too and the input url type
> patch has been r/sr so I prefer to not add an assert here because that would
> be wrong. I can open a follow-up to assert if an element suffers from
> something it should not.

You should either have a if-statement or an assertion. Right now the code assumes that the type is email (as it always produces an email related message), but doesn't back that assumption up with an assertion.

Feel free to go either way, whichever solution you use (including the current one), you'll need to modify the code when merging the url patch.


> I think the example you've proposed is tested. Actually, the empty token should
> be managed correctly:
> - if the token is empty at the beginning or at the middle of the string,
> |nextToken| will return an empty string which will not a valid email address ;
> - if the token is empty at the end of the string, as I call |return
> !tokenizer.lastTokenEndedWithSeparator();|, I think the list will be invalid.

Doh! You are correct, I missed the call to lastTokenEndedWithSeparator at the end.
Comment 40 Mounir Lamouri (:mounir) 2010-05-26 17:53:43 PDT
(In reply to comment #39)
> > > At least assert that mType == NS_FORM_INPUT_EMAIL
> > 
> > input type='url' can suffer from type mismatch too and the input url type
> > patch has been r/sr so I prefer to not add an assert here because that would
> > be wrong. I can open a follow-up to assert if an element suffers from
> > something it should not.
> 
> You should either have a if-statement or an assertion. Right now the code
> assumes that the type is email (as it always produces an email related
> message), but doesn't back that assumption up with an assertion.
> 
> Feel free to go either way, whichever solution you use (including the current
> one), you'll need to modify the code when merging the url patch.

I've just checked input-url.patch and I realize previous me is probably smarter than I am ;) ... there is a NS_ERROR_UNEXPECTED if type != (email and url).
Comment 41 Mounir Lamouri (:mounir) 2010-05-26 18:59:48 PDT
Created attachment 447676 [details] [diff] [review]
Tests v3.4

I've added two cases: 'foo@b' and 'foo@foo..bar' (both invalids).
Comment 42 Mounir Lamouri (:mounir) 2010-05-26 19:00:21 PDT
Created attachment 447677 [details] [diff] [review]
Patch v6
Comment 43 Jonas Sicking (:sicking) PTO Until July 5th 2010-05-26 23:21:35 PDT
Comment on attachment 447677 [details] [diff] [review]
Patch v6

>+    case VALIDATION_MESSAGE_TYPE_MISMATCH:
>+    {
>+      nsXPIDLString message;
>+      rv = nsContentUtils::GetLocalizedString(nsContentUtils::eDOM_PROPERTIES,
>+                                              "ElementSuffersFromInvalidEmail",
>+                                              message);
>+      aValidationMessage = message;
>+      break;
>+    }

I still think this is wrong. There might be a separate patch that makes things correct, however IMO each bug should stand on its own. For example if we back out the url patch, or if it ends up never landing for whatever reason, we'd want to have an assertion or an if-check here.

r=me anyhow.
Comment 44 Mounir Lamouri (:mounir) 2010-05-27 04:27:23 PDT
Created attachment 447734 [details] [diff] [review]
Patch v6.1

r=smaug,sicking

I've added an assert.
Comment 45 Johnny Stenback (:jst, jst@mozilla.com) 2010-06-03 16:56:15 PDT
Comment on attachment 447734 [details] [diff] [review]
Patch v6.1

- In nsHTMLInputElement::IsValidEmailAddress():

+  // Parsing the username.
+  for (; i < length && aValue[i] != '@'; ++i) {
[...]
+  }
+
+  // There is no domain name, that's not a valid email address.
+  if (i == length) {
+    return PR_FALSE;
+  }
+
+  // The domain name can't begin with a dot.
+  if (aValue[++i] == '.') {

This can look past the end of the string if the input is "foo@", you should change the previous if check to take that case into account as well, or something.

- In xpcom/ds/nsCharSeparatedTokenizer.h:

+#include "nsContentUtils.h"

You can not depend on content code anywhere in xpcom/. We'll need a different approach to this...

     PRBool isWhitespace(PRUnichar aChar)
     {
-        return aChar <= ' ' &&
-               (aChar == ' ' || aChar == '\n' ||
-                aChar == '\r'|| aChar == '\t');
+      return (mFlags & HTML_WHITESPACE) ?
+             nsContentUtils::IsHTMLWhitespace(aChar) :
+             nsCRT::IsAsciiSpace(aChar);
     }

Given how this is used, the best approach might be to make this a template class that lets the user of this class specify a whitespace checking class as a template parameter, with this code providing one that uses the nsCRT::IsAsciiSpace() one for convenience.

sr- based on the fact that we need another approach to nsCharSeparatedTokenizer, otherwise this looks good.
Comment 46 Mounir Lamouri (:mounir) 2010-06-07 06:06:36 PDT
Created attachment 449610 [details] [diff] [review]
Patch v6.2

I fixed the issue with i>lenght and i've added a test.

For whitespace checking, I've used a function template but I'm wondering if a pointer function wouldn't be more suitable.
Comment 47 Mounir Lamouri (:mounir) 2010-06-07 06:58:46 PDT
Created attachment 449617 [details] [diff] [review]
Patch v6.3

I've changed nsCharSeparatedTokenizer name to nsCharSeparatedTokenizerTemplate and added nsCharSeparatedTokenizer which inherits from nsCharSeparatedTokenizerTemplate<> so callers can use nsCharSeparatedTokenizer whitout adding <>. Thanks to Olli for the suggestion.
Comment 48 Johnny Stenback (:jst, jst@mozilla.com) 2010-06-07 13:36:47 PDT
Comment on attachment 449617 [details] [diff] [review]
Patch v6.3

Looks good to me! sr=jst
Comment 49 Mounir Lamouri (:mounir) 2010-07-28 06:37:23 PDT
Created attachment 460861 [details] [diff] [review]
Patch v6.4

Updated to apply against current tip.
Removing nsCharTokenizer changes as they have been landed with bug 574570.
Removing todo's in the tests as bug 549475 has been landed.
Comment 50 Mounir Lamouri (:mounir) 2010-08-01 08:40:47 PDT
Created attachment 461895 [details] [diff] [review]
Patch v6.5

Yet-another-update: I've moved the changes for value sanitizing in another bug to prevent landing non-reviewed code (even if quite trivial).
Comment 51 Mounir Lamouri (:mounir) 2010-08-19 14:19:45 PDT
This is now in the tree:
rev: http://hg.mozilla.org/mozilla-central/rev/010d34a2c89e
Comment 52 Jean Claveau 2011-11-16 05:38:45 PST
Wouldn't it be more efficient/simple tu use à regexp for this kind of validation?

Something like that :
/^[a-zA-Z0-9\.\!\#...other special chars...\~]+@[a-z0-9\-]+(\.[a-z0-9\-]+)+$/

May work as needed and replace almost the 60 lines of IsValidEmailAddress.
I've always considered regexps as faster than algos so I had to ask you :).

I never developed for mozilla but i've seen that perl regexps are used for jsregexp and this one is perl compliant (if I'm not wrong).


Thanks in advance.
Comment 53 Daniel Holbert [:dholbert] 2011-11-27 18:16:02 PST
*** Bug 573489 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.