Closed Bug 555559 Opened 14 years ago Closed 14 years ago

Implement <input type="email">

Categories

(Core :: DOM: Core & HTML, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla2.0b5

People

(Reporter: mounir, Assigned: mounir)

References

()

Details

(Keywords: dev-doc-complete, html5, Whiteboard: [parity-opera])

Attachments

(1 file, 21 obsolete files)

37.09 KB, patch
Details | Diff | Splinter Review
HTML5 Forms introduces a new state for input elements. The email state basically lets the user put an email then it can be validated with the constraint validation api (bug 345624).
Other things could be done with follow-up bugs like implementation of multiple attribute.
Dolske: Any suggestions for adding in searching of email contacts?

Currently, nsFormFillController chooses between searching with the login manager or satchel.
http://hg.mozilla.org/mozilla-central/annotate/fb6fa1f88790/toolkit/components/satchel/src/nsFormFillController.cpp#l496

The nsIDOMHTMLInputElement is available here as mFocusedInput, so it could check the type and dynamically dispatch to the appropriate nsIFormAutoComplete.autoCompleteSearch. This would need some way to allow nsIFormAutoCompletes to register which types they support. I suppose nsILoginManager wouldn't need to duplicate the interface and nsLoginManager could register itself as type=password.. maybe.

Alternatively, satchel could just handle all of these as it does now, and it also gets the html input, so it can decide what to do with the special type such as finding results that match the input's name OR the input's type.

However, looking at Contacts from Labs, people/email probably will live in a separate database/table as it's more complex than just previous-value retrieval. So not overloading satchel might be better..
Blocks: 556817
Blocks: 558370
Blocks: 558651
Assignee: nobody → mounir.lamouri
Status: NEW → ASSIGNED
Attached patch Tests v1 (obsolete) — Splinter Review
Attachment #438894 - Flags: review?(Olli.Pettay)
Attached patch Patch v1 (obsolete) — Splinter Review
This bug follows bug 456229 and bug 558594 so, Olli, I think you are the best person to review it.
Attachment #438895 - Flags: review?(Olli.Pettay)
The input element in the email state uses the multiple attributes. I think that could be done in a follow-up.
Attached patch Tests v1.1 (obsolete) — Splinter Review
I've added tests to check the email is invalid when surrounded by spaces.
Attachment #438894 - Attachment is obsolete: true
Attachment #438902 - Flags: review?(Olli.Pettay)
Attachment #438894 - Flags: review?(Olli.Pettay)
Could you implement support for 'multiple' in this bug?
The patch isn't too large yet.
Comment on attachment 438902 [details] [diff] [review]
Tests v1.1

I hoping to get tests for "multiple"
Attachment #438902 - Flags: review?(Olli.Pettay) → review-
Comment on attachment 438895 [details] [diff] [review]
Patch v1

Same here.
Attachment #438895 - Flags: review?(Olli.Pettay) → review-
Blocks: 559309
Attached patch Tests v2 (obsolete) — Splinter Review
Attachment #438902 - Attachment is obsolete: true
Attachment #439119 - Flags: review?(Olli.Pettay)
Attached patch Patch v2 (obsolete) — Splinter Review
I thought I will have to do a dynamic check with the multiple attribute but as long as we check validity only with js calls, I think we can parse the value each time.
I think implementing :valid and :invalid pseudo-element will need some changes but as it will surely not be the only change, let's keep a trivial implementation until :valid implementation.

I wrote my own function to split the email list because I think calling some helpers would be a loss in performance. At the moment, the string is read twice at worst. It could be better (read once) if I wasn't using |IsValidEmailAddress()|. If you think performance is less important that readability in this situation, I can change that.

For the function validating a simple email address I think it's quite easy to understand it and it's probably much more better than a regexp.
Attachment #438895 - Attachment is obsolete: true
Attachment #439123 - Flags: review?(Olli.Pettay)
(In reply to comment #10)
> Created an attachment (id=439123) [details]
Seems like you attached a patch for type=search.

> For the function validating a simple email address I think it's quite easy to
> understand it and it's probably much more better than a regexp.
Curious, the validating will only set those :valid pseudo-elements and not prevent other inputs, yes? For example, Édwärd is an invalid email (both before and after the @), but that wouldn't prevent me from typing it in? This is important for a smarter formfill that can match by any string to suggest an actual email address.
Attached patch Patch v2.1 (obsolete) — Splinter Review
Indeed, it was the input type search patch.
Thank you for warning me Edward :)
Attachment #439123 - Attachment is obsolete: true
Attachment #439195 - Flags: review?(Olli.Pettay)
Attachment #439123 - Flags: review?(Olli.Pettay)
(In reply to comment #11)
> (In reply to comment #10)
> > For the function validating a simple email address I think it's quite easy to
> > understand it and it's probably much more better than a regexp.
> Curious, the validating will only set those :valid pseudo-elements and not
> prevent other inputs, yes? For example, Édwärd is an invalid email (both before
> and after the @), but that wouldn't prevent me from typing it in? This is
> important for a smarter formfill that can match by any string to suggest an
> actual email address.

Right, you can type whatever you want even if it's recognized as invalid. But the CSS pseudo-element will have to be quite real-time AFAIK so if you type "foo@foo." is will not be valid but when "foo@foo.c" will be typed, the form field should be recognized as valid. I don't think that would be a good idea to parse the entire string each time a user do an input in the field.
Anyway, I've not read this specification at the moment...
Comment on attachment 439119 [details] [diff] [review]
Tests v2

in for loops add space before and after <

You don't seem to test other white space characters than " "
Attachment #439119 - Flags: review?(Olli.Pettay) → review-
Comment on attachment 439195 [details] [diff] [review]
Patch v2.1

+nsHTMLInputElement::IsValidEmailAddressList(const nsAString& aValue)
doesn't handle all the white space characters.

There is a typo 'Trairing'.


Nit,
    }
    else if (c == ' ') {
->
    } else if (c == ' ') {



+    endEmail = length-1;
+  }
Space before and after -


I think adding set-of-comma-separated-tokens algorithm to
a method of nsContentUtils might be useful. It could take a
pointer to some function, which would be called for
each token.
Though, I wonder if we already have something like it somewhere. nsCRT::strtok(..., ", \t", ...) is quite close.
Attachment #439195 - Flags: review?(Olli.Pettay) → review-
So, HTML5 space characters are: ' ', '\n', '\r', '\t' and '\f'. I'm wondering why '\f' (first time I heard about this one) ? I'm also wondering why nsHTMLInputElement.cpp sets: |kWhitespace[] = "\n\r\t\b";|. Why is '\b' considered as a white space character ?

In addition, in my opinion, |email.value='f\roo@bar.com'| is a valid email value because '\r' should be stripped (like '\n') but at the moment, return values are not stripped but changed in white spaces. I'm wondering if I'm misinterpreting the specifications or if there is a bug here ? Webkit behavior is like our current behavior but Opera's is what I think may be the right one.

Indeed, we could add a set-of-comma-separated-tokens algorithm using strtok.
What I'm mainly concerned (in practice) is \t.

\b is there because of some historic reason. HTML5 may very well miss it, or
perhaps the reasons have changed. Bug 114997 seems to have just copied those
whitespace values from parser.
Attached patch Tests v3 (obsolete) — Splinter Review
I've added some tests in 'todo' because they need to have '\r' and '\n' stripped.
Attachment #439119 - Attachment is obsolete: true
Attachment #439938 - Flags: review?(Olli.Pettay)
Attached patch Patch v3 (obsolete) — Splinter Review
I did not add a function checking for the validity of a set of token separated by commas because email list are the only one needing to be checked however, I've added a function returning the next token in a token list separated by commas.
Attachment #439195 - Attachment is obsolete: true
Attachment #439939 - Flags: review?(Olli.Pettay)
Attached patch Tests v3.1 (obsolete) — Splinter Review
Now adding tests to the 'required' attribute tests as the patch has now to be applied on top of the required patch.
Attachment #439938 - Attachment is obsolete: true
Attachment #440188 - Flags: review?(Olli.Pettay)
Attachment #439938 - Flags: review?(Olli.Pettay)
Attached patch Patch v3.1 (obsolete) — Splinter Review
Updated patch now using the required patch (see bug 345822).
Attachment #439939 - Attachment is obsolete: true
Attachment #440190 - Flags: review?(Olli.Pettay)
Attachment #439939 - Flags: review?(Olli.Pettay)
Depends on: 345822
Comment on attachment 440188 [details] [diff] [review]
Tests v3.1

Few nits

>+++ b/content/html/content/test/test_bug555559.html	Tue Apr 20 12:43:49 2010 +0200
...
>+function todoCheckValidEmailAddress(element)
>+{
>+  todo(!element.validity.typeMismatch,
>+    "Element should not suffer from type mismatch");
Align "Element under !element
>+  todo(element.validity.valid, "Element should be valid");
>+  todo(element.checkValidity(), "Element should be valid");
>+  todo_is(element.validationMessage, '',
>+    "Validation message should be the empty string");
similar thing here

>+function checkValidEmailAddress(element)
>+{
>+  ok(!element.validity.typeMismatch,
>+    "Element should not suffer from type mismatch");
And here

>+  ok(element.validity.valid, "Element should be valid");
>+  ok(element.checkValidity(), "Element should be valid");
>+  is(element.validationMessage, '',
>+    "Validation message should be the empty string");
And here

>+for (var i=0; i<legalCharacters.length; i++)
Space before and after <, and { should be in the same
line as 'for'

>+for (var i=0; i<illegalCharacters.length; i++)
>+{
ditto

>+for (var i=0; i<illegalCharacters.length; i++)
>+{
and here


Could you add some tests for invalid event. 
I know test_bug345624-1.html has some tests, but would be great to make
sure that the event works properly also with "email"
Attachment #440188 - Flags: review?(Olli.Pettay) → review+
Attached patch Tests v3.2 (obsolete) — Splinter Review
r=smaug
Attachment #440188 - Attachment is obsolete: true
Comment on attachment 440190 [details] [diff] [review]
Patch v3.1

I assume there will be a new less-buggy patch coming ;)
And please add more testcases for parsing.
Attachment #440190 - Flags: review?(Olli.Pettay) → review-
Attached patch Tests v3.3 (obsolete) — Splinter Review
r=smaug

I've added one test to check the case when multiple commas are in the middle of a list. We can't do much more test related to token list in this patch because the only way we can test is checking the validity, not the different tokens.
Attachment #441350 - Attachment is obsolete: true
Attached patch Patch v4 (obsolete) — Splinter Review
I've cleaned up |IsValidEmailAddressList| method.
Attachment #440190 - Attachment is obsolete: true
Attachment #441507 - Flags: review?(Olli.Pettay)
Blocks: 562625
Comment on attachment 441507 [details] [diff] [review]
Patch v4


>+PRBool nsContentUtils::GetNextTokenFromCommaSeparatedList(nsAString::const_iterator& beginString,
>+                                                          const nsAString::const_iterator& endString,
>+                                                          nsAString::const_iterator& beginToken,
>+                                                          nsAString::const_iterator& endToken)
>+{
>+  // We are using |beginString| as an iterator. It prevents us to re-setting
>+  // |beginString| to |iter| at the end, especially to forget doing that ;).
>+
>+  while (beginString != endString && IsHTMLWhitespace(*beginString)) {
>+    // Igoring leading white spaces.
>+    ++beginString;
>+  }
>+
>+  beginToken = beginString;
>+
>+  while (beginString != endString && !IsHTMLWhitespace(*beginString) && *beginString != ',') {
>+    // The token ends when a white spaces or a comma is found.
>+    ++beginString;
>+  }
>+
>+  endToken = beginString;
>+
>+  while (beginString != endString && *beginString != ',') {
>+    // Ignoring trailing white spaces.
>+    ++beginString;
>+  }
>+
>+  if (*beginString == ',') {
>+    // If we are ending because of a comma, we don't want it to be in the
>+    // returned list.
>+    ++beginString;
>+
>+    // If we are ending with a comma, there must be a token following.
>+    return PR_TRUE;
>+  }
>+
>+  // Next time, we want to parse the list from here.
>+  beginString = beginString;
This looks wrong. And probably means that there is some testcase missing.


If you check/fix that, and hopefully even find a testecase for it, r=me

I think a second review would be good for this because it is easy to make
small mistakes in parsing.
Attachment #441507 - Flags: review?(Olli.Pettay) → review+
Attached patch Patch v4.1 (obsolete) — Splinter Review
Actually, the reason of this is explained at the beginning of the function:
>  // We are using |beginString| as an iterator. It prevents us to re-setting
>  // |beginString| to |iter| at the end, especially to forget doing that ;).

So, I've done a s/iter/beginString/ then, the final |beginString = iter;| became |beginString = beginString;| and I didn't notice it. This is a stupid mistake but harmless as it is not changing the behavior.
Attachment #441507 - Attachment is obsolete: true
Attachment #443314 - Flags: superreview?(jst)
Attachment #443314 - Flags: review?(jonas)
I meant just one another review, not review + sr ;)
Attachment #443314 - Flags: superreview?(jst)
Comment on attachment 443314 [details] [diff] [review]
Patch v4.1

Olli: You meant for you to do that other review, right? Not for another person to do it.

If you do want me to review please assign the r? back to me.
Attachment #443314 - Flags: review?(jonas) → review?(Olli.Pettay)
Comment on attachment 443314 [details] [diff] [review]
Patch v4.1

I meant another eyes :)
Attachment #443314 - Flags: review?(Olli.Pettay) → review?(jonas)
Could you use nsCharSeparatedTokenizer instead?
As I wondered in comment 15, we seem to have something for tokenization after all :) And nsCharSeparatedTokenizer has rather nice API.
Indeed, that sounds good. I will just have to add a flag to use |IsHTMLWhitespace| instead of the list here (which could be |NS_IsAsciiWhitespace| or |IsAsciiSpace|). By the way, do you have any idea why there is a check for |aChar <= ' '| in |isWhitespace| ? Is that intend to be an optimization ?
> PRBool isWhitespace(PRUnichar aChar)
> {
>   return aChar <= ' ' &&
>          (aChar == ' ' || aChar == '\n' ||
>           aChar == '\r'|| aChar == '\t');
> }
Attachment #443314 - Flags: review?(jonas)
Attached patch Patch v5 (obsolete) — Splinter Review
r=smaug

This patch is using |nsCharSeparatedTokenizer| instead of creating a method in 
|nsContentUtils|.
Attachment #443314 - Attachment is obsolete: true
Attachment #444677 - Flags: review?(jonas)
Blocks: 566348
Comment on attachment 444677 [details] [diff] [review]
Patch v5

>+nsHTMLInputElement::HasTypeMismatch()
>+{
>+  if (mType == NS_FORM_INPUT_EMAIL) {
>+    nsAutoString value;
>+    NS_ENSURE_SUCCESS(GetValue(value), PR_FALSE);
>+
>+    if (value.IsEmpty()) {
>+      return PR_FALSE;
>+    }
>+
>+    if (HasAttr(kNameSpaceID_None, nsGkAtoms::multiple)) {
>+      return !IsValidEmailAddressList(value);
>+    }
>+
>+    return !IsValidEmailAddress(value);

return HasAttr(kNameSpaceID_None, nsGkAtoms::multiple) ?
       !IsValidEmailAddressList(value) :
       !IsValidEmailAddress(value)

Or, IMHO even better, inline the IsValidEmailAddressList implementation into this function. Seems unnecessary to have a separate function for now. If we end up needing in it in other places then we can always break it out into a separate function then (which likely would live in another class).


>@@ -3422,23 +3478,98 @@ nsHTMLInputElement::GetValidationMessage
>         default:
>           key.Assign("TextElementSuffersFromBeingMissing");
>       }
>       rv = nsContentUtils::GetLocalizedString(nsContentUtils::eDOM_PROPERTIES,
>                                               key.get(), message);
>       aValidationMessage = message;
>       break;
>     }
>+    case VALIDATION_MESSAGE_TYPE_MISMATCH:
>+    {
>+      nsXPIDLString message;
>+      rv = nsContentUtils::GetLocalizedString(nsContentUtils::eDOM_PROPERTIES,
>+                                              "ElementSuffersFromInvalidEmail",
>+                                              message);
>+      aValidationMessage = message;
>+      break;
>+    }

At least assert that mType == NS_FORM_INPUT_EMAIL

>+//static
>+PRBool
>+nsHTMLInputElement::IsValidEmailAddressList(const nsAString& aValue)
>+{
>+  nsCharSeparatedTokenizer tokenizer(aValue, ',', nsCharSeparatedTokenizer::HTML_WHITESPACE);
>+
>+  while (tokenizer.hasMoreTokens()) {
>+    if (!IsValidEmailAddress(tokenizer.nextToken())) {
>+      return PR_FALSE;
>+    }
>+  }
>+
>+  return !tokenizer.lastTokenEndedWithSeparator();
>+}

Have you checked that this does the right thing with regards to values like:

",foo@bar.com,   , foo@bar.com  ,  "

I.e. with regards to empty entries in the beginning, middle and end. And with regards to whitespace trimming. Hint, you might need to use lastTokenEndedWithSeparator()

>+PRBool
>+nsHTMLInputElement::IsValidEmailAddress(const nsAString& aValue)
>+{
>+  PRBool parsingDomain = PR_TRUE;
>+  PRUint32 dotIdx = 0;
>+  PRUint32 length = aValue.Length();
>+
>+  for (PRUint32 i=0; i<length; i++) {
>+    PRUnichar c = aValue[length - i - 1];
>+
>+    if (parsingDomain) {
>+      if (c == '@') {
>+        // The domain part has to contain a dot (.).
>+        if (dotIdx == 0) {
>+          return PR_FALSE;
>+        }
>+        // The dot (.) can't be the first domain character.
>+        if (dotIdx == i - 1) {
>+          return PR_FALSE;
>+        }
>+        // The '@' is the first character, there is no username.
>+        if (i == length - 1) {
>+          return PR_FALSE;
>+        }
>+        parsingDomain = PR_FALSE;
>+      } else if (c == '.') {
>+        // The dot (.) can't be at the end of the address.
>+        if (i == 0) {
>+          return PR_FALSE;
>+        }
>+        dotIdx = i;
>+      } else if (!(nsCRT::IsAsciiAlpha(c) || nsCRT::IsAsciiDigit(c) ||
>+                   c == '-')) {
>+        return PR_FALSE;
>+      }
>+    } else { // Parsing the username part.
>+      if (!(nsCRT::IsAsciiAlpha(c) || nsCRT::IsAsciiDigit(c) ||
>+            c == '.' || c == '!' || c == '#' || c == '$' || c == '%' ||
>+            c == '&' || c == '\''|| c == '*' || c == '+' || c == '-' ||
>+            c == '/' || c == '=' || c == '?' || c == '^' || c == '_' ||
>+            c == '`' || c == '{' || c == '|' || c == '}' || c == '~')) {
>+        return PR_FALSE;
>+      }
>+    }
>+  }

Personally I think this would be more readable as two for-loops, rather than this state-machine approach. Up to you.

>diff -r ae44e4c86c77 xpcom/ds/nsCharSeparatedTokenizer.h
>@@ -151,19 +153,21 @@ public:
> private:
>     nsSubstring::const_char_iterator mIter, mEnd;
>     PRPackedBool mLastTokenEndedWithSeparator;
>     PRUnichar mSeparatorChar;
>     PRUint32  mFlags;
> 
>     PRBool isWhitespace(PRUnichar aChar)
>     {
>-        return aChar <= ' ' &&
>-               (aChar == ' ' || aChar == '\n' ||
>-                aChar == '\r'|| aChar == '\t');
>+      if (mFlags & HTML_WHITESPACE) {
>+        return nsContentUtils::IsHTMLWhitespace(aChar);
>+      }
>+
>+      return nsCRT::IsAsciiSpace(aChar);

return mFlags & HTML_WHITESPACE ?
       nsContentUtils::IsHTMLWhitespace(aChar) :
       nsCRT::IsAsciiSpace(aChar)

looks good otherwise. Would like to know that the nsCharSeparatedTokenizer does the right thing when it comes to empty tokens before putting my r=me on the patch though, as I suspect it currently does not.
Comment on attachment 444677 [details] [diff] [review]
Patch v5

Unsetting request. Rerequest if the current patch is indeed correct.
Attachment #444677 - Flags: review?(jonas)
(In reply to comment #36)
> (From update of attachment 444677 [details] [diff] [review])
> >+nsHTMLInputElement::HasTypeMismatch()
> >+{
> >+  if (mType == NS_FORM_INPUT_EMAIL) {
> >+    nsAutoString value;
> >+    NS_ENSURE_SUCCESS(GetValue(value), PR_FALSE);
> >+
> >+    if (value.IsEmpty()) {
> >+      return PR_FALSE;
> >+    }
> >+
> >+    if (HasAttr(kNameSpaceID_None, nsGkAtoms::multiple)) {
> >+      return !IsValidEmailAddressList(value);
> >+    }
> >+
> >+    return !IsValidEmailAddress(value);
> 
> return HasAttr(kNameSpaceID_None, nsGkAtoms::multiple) ?
>        !IsValidEmailAddressList(value) :
>        !IsValidEmailAddress(value)
> 
> Or, IMHO even better, inline the IsValidEmailAddressList implementation into
> this function. Seems unnecessary to have a separate function for now. If we end
> up needing in it in other places then we can always break it out into a
> separate function then (which likely would live in another class).

I will probably need IsValidEmailAddressList for :vaild and :invalid pseudo-classes. I prefer to keep the function in this patch and I will see with the pseudo-classes patch what I do with them.

> >@@ -3422,23 +3478,98 @@ nsHTMLInputElement::GetValidationMessage
> >         default:
> >           key.Assign("TextElementSuffersFromBeingMissing");
> >       }
> >       rv = nsContentUtils::GetLocalizedString(nsContentUtils::eDOM_PROPERTIES,
> >                                               key.get(), message);
> >       aValidationMessage = message;
> >       break;
> >     }
> >+    case VALIDATION_MESSAGE_TYPE_MISMATCH:
> >+    {
> >+      nsXPIDLString message;
> >+      rv = nsContentUtils::GetLocalizedString(nsContentUtils::eDOM_PROPERTIES,
> >+                                              "ElementSuffersFromInvalidEmail",
> >+                                              message);
> >+      aValidationMessage = message;
> >+      break;
> >+    }
> 
> At least assert that mType == NS_FORM_INPUT_EMAIL

input type='url' can suffer from type mismatch too and the input url type patch has been r/sr so I prefer to not add an assert here because that would be wrong. I can open a follow-up to assert if an element suffers from something it should not.

> >+//static
> >+PRBool
> >+nsHTMLInputElement::IsValidEmailAddressList(const nsAString& aValue)
> >+{
> >+  nsCharSeparatedTokenizer tokenizer(aValue, ',', nsCharSeparatedTokenizer::HTML_WHITESPACE);
> >+
> >+  while (tokenizer.hasMoreTokens()) {
> >+    if (!IsValidEmailAddress(tokenizer.nextToken())) {
> >+      return PR_FALSE;
> >+    }
> >+  }
> >+
> >+  return !tokenizer.lastTokenEndedWithSeparator();
> >+}
> 
> Have you checked that this does the right thing with regards to values like:
> 
> ",foo@bar.com,   , foo@bar.com  ,  "
> 
> I.e. with regards to empty entries in the beginning, middle and end. And with
> regards to whitespace trimming. Hint, you might need to use
> lastTokenEndedWithSeparator()

I think the example you've proposed is tested. Actually, the empty token should be managed correctly:
- if the token is empty at the beginning or at the middle of the string, |nextToken| will return an empty string which will not a valid email address ;
- if the token is empty at the end of the string, as I call |return !tokenizer.lastTokenEndedWithSeparator();|, I think the list will be invalid.

> >+PRBool
> >+nsHTMLInputElement::IsValidEmailAddress(const nsAString& aValue)
> >+{
> >+  PRBool parsingDomain = PR_TRUE;
> >+  PRUint32 dotIdx = 0;
> >+  PRUint32 length = aValue.Length();
> >+
> >+  for (PRUint32 i=0; i<length; i++) {
> >+    PRUnichar c = aValue[length - i - 1];
> >+
> >+    if (parsingDomain) {
> >+      if (c == '@') {
> >+        // The domain part has to contain a dot (.).
> >+        if (dotIdx == 0) {
> >+          return PR_FALSE;
> >+        }
> >+        // The dot (.) can't be the first domain character.
> >+        if (dotIdx == i - 1) {
> >+          return PR_FALSE;
> >+        }
> >+        // The '@' is the first character, there is no username.
> >+        if (i == length - 1) {
> >+          return PR_FALSE;
> >+        }
> >+        parsingDomain = PR_FALSE;
> >+      } else if (c == '.') {
> >+        // The dot (.) can't be at the end of the address.
> >+        if (i == 0) {
> >+          return PR_FALSE;
> >+        }
> >+        dotIdx = i;
> >+      } else if (!(nsCRT::IsAsciiAlpha(c) || nsCRT::IsAsciiDigit(c) ||
> >+                   c == '-')) {
> >+        return PR_FALSE;
> >+      }
> >+    } else { // Parsing the username part.
> >+      if (!(nsCRT::IsAsciiAlpha(c) || nsCRT::IsAsciiDigit(c) ||
> >+            c == '.' || c == '!' || c == '#' || c == '$' || c == '%' ||
> >+            c == '&' || c == '\''|| c == '*' || c == '+' || c == '-' ||
> >+            c == '/' || c == '=' || c == '?' || c == '^' || c == '_' ||
> >+            c == '`' || c == '{' || c == '|' || c == '}' || c == '~')) {
> >+        return PR_FALSE;
> >+      }
> >+    }
> >+  }
> 
> Personally I think this would be more readable as two for-loops, rather than
> this state-machine approach. Up to you.

You're right. States machines are clearly less readable.

> >diff -r ae44e4c86c77 xpcom/ds/nsCharSeparatedTokenizer.h
> >@@ -151,19 +153,21 @@ public:
> > private:
> >     nsSubstring::const_char_iterator mIter, mEnd;
> >     PRPackedBool mLastTokenEndedWithSeparator;
> >     PRUnichar mSeparatorChar;
> >     PRUint32  mFlags;
> > 
> >     PRBool isWhitespace(PRUnichar aChar)
> >     {
> >-        return aChar <= ' ' &&
> >-               (aChar == ' ' || aChar == '\n' ||
> >-                aChar == '\r'|| aChar == '\t');
> >+      if (mFlags & HTML_WHITESPACE) {
> >+        return nsContentUtils::IsHTMLWhitespace(aChar);
> >+      }
> >+
> >+      return nsCRT::IsAsciiSpace(aChar);
> 
> return mFlags & HTML_WHITESPACE ?
>        nsContentUtils::IsHTMLWhitespace(aChar) :
>        nsCRT::IsAsciiSpace(aChar)

Ok.

> looks good otherwise. Would like to know that the nsCharSeparatedTokenizer does
> the right thing when it comes to empty tokens before putting my r=me on the
> patch though, as I suspect it currently does not.

For me nsCharSeparatedTokenizer is correctly used. Please, correct me if I'm wrong.

I'm going to update the patch for the other issues.
> > At least assert that mType == NS_FORM_INPUT_EMAIL
> 
> input type='url' can suffer from type mismatch too and the input url type
> patch has been r/sr so I prefer to not add an assert here because that would
> be wrong. I can open a follow-up to assert if an element suffers from
> something it should not.

You should either have a if-statement or an assertion. Right now the code assumes that the type is email (as it always produces an email related message), but doesn't back that assumption up with an assertion.

Feel free to go either way, whichever solution you use (including the current one), you'll need to modify the code when merging the url patch.


> I think the example you've proposed is tested. Actually, the empty token should
> be managed correctly:
> - if the token is empty at the beginning or at the middle of the string,
> |nextToken| will return an empty string which will not a valid email address ;
> - if the token is empty at the end of the string, as I call |return
> !tokenizer.lastTokenEndedWithSeparator();|, I think the list will be invalid.

Doh! You are correct, I missed the call to lastTokenEndedWithSeparator at the end.
(In reply to comment #39)
> > > At least assert that mType == NS_FORM_INPUT_EMAIL
> > 
> > input type='url' can suffer from type mismatch too and the input url type
> > patch has been r/sr so I prefer to not add an assert here because that would
> > be wrong. I can open a follow-up to assert if an element suffers from
> > something it should not.
> 
> You should either have a if-statement or an assertion. Right now the code
> assumes that the type is email (as it always produces an email related
> message), but doesn't back that assumption up with an assertion.
> 
> Feel free to go either way, whichever solution you use (including the current
> one), you'll need to modify the code when merging the url patch.

I've just checked input-url.patch and I realize previous me is probably smarter than I am ;) ... there is a NS_ERROR_UNEXPECTED if type != (email and url).
Attached patch Tests v3.4 (obsolete) — Splinter Review
I've added two cases: 'foo@b' and 'foo@foo..bar' (both invalids).
Attachment #441505 - Attachment is obsolete: true
Attached patch Patch v6 (obsolete) — Splinter Review
Attachment #444677 - Attachment is obsolete: true
Attachment #447677 - Flags: review?(jonas)
Comment on attachment 447677 [details] [diff] [review]
Patch v6

>+    case VALIDATION_MESSAGE_TYPE_MISMATCH:
>+    {
>+      nsXPIDLString message;
>+      rv = nsContentUtils::GetLocalizedString(nsContentUtils::eDOM_PROPERTIES,
>+                                              "ElementSuffersFromInvalidEmail",
>+                                              message);
>+      aValidationMessage = message;
>+      break;
>+    }

I still think this is wrong. There might be a separate patch that makes things correct, however IMO each bug should stand on its own. For example if we back out the url patch, or if it ends up never landing for whatever reason, we'd want to have an assertion or an if-check here.

r=me anyhow.
Attachment #447677 - Flags: review?(jonas) → review+
Attached patch Patch v6.1 (obsolete) — Splinter Review
r=smaug,sicking

I've added an assert.
Attachment #447677 - Attachment is obsolete: true
Attachment #447734 - Flags: superreview?(jst)
Comment on attachment 447734 [details] [diff] [review]
Patch v6.1

- In nsHTMLInputElement::IsValidEmailAddress():

+  // Parsing the username.
+  for (; i < length && aValue[i] != '@'; ++i) {
[...]
+  }
+
+  // There is no domain name, that's not a valid email address.
+  if (i == length) {
+    return PR_FALSE;
+  }
+
+  // The domain name can't begin with a dot.
+  if (aValue[++i] == '.') {

This can look past the end of the string if the input is "foo@", you should change the previous if check to take that case into account as well, or something.

- In xpcom/ds/nsCharSeparatedTokenizer.h:

+#include "nsContentUtils.h"

You can not depend on content code anywhere in xpcom/. We'll need a different approach to this...

     PRBool isWhitespace(PRUnichar aChar)
     {
-        return aChar <= ' ' &&
-               (aChar == ' ' || aChar == '\n' ||
-                aChar == '\r'|| aChar == '\t');
+      return (mFlags & HTML_WHITESPACE) ?
+             nsContentUtils::IsHTMLWhitespace(aChar) :
+             nsCRT::IsAsciiSpace(aChar);
     }

Given how this is used, the best approach might be to make this a template class that lets the user of this class specify a whitespace checking class as a template parameter, with this code providing one that uses the nsCRT::IsAsciiSpace() one for convenience.

sr- based on the fact that we need another approach to nsCharSeparatedTokenizer, otherwise this looks good.
Attachment #447734 - Flags: superreview?(jst) → superreview-
Attached patch Patch v6.2 (obsolete) — Splinter Review
I fixed the issue with i>lenght and i've added a test.

For whitespace checking, I've used a function template but I'm wondering if a pointer function wouldn't be more suitable.
Attachment #447676 - Attachment is obsolete: true
Attachment #447734 - Attachment is obsolete: true
Attachment #449610 - Flags: superreview?(jst)
Attached patch Patch v6.3 (obsolete) — Splinter Review
I've changed nsCharSeparatedTokenizer name to nsCharSeparatedTokenizerTemplate and added nsCharSeparatedTokenizer which inherits from nsCharSeparatedTokenizerTemplate<> so callers can use nsCharSeparatedTokenizer whitout adding <>. Thanks to Olli for the suggestion.
Attachment #449610 - Attachment is obsolete: true
Attachment #449617 - Flags: superreview?(jst)
Attachment #449610 - Flags: superreview?(jst)
Comment on attachment 449617 [details] [diff] [review]
Patch v6.3

Looks good to me! sr=jst
Attachment #449617 - Flags: superreview?(jst) → superreview+
Blocks: 558788
Blocks: 561635
Depends on: 581652
Attached patch Patch v6.4 (obsolete) — Splinter Review
Updated to apply against current tip.
Removing nsCharTokenizer changes as they have been landed with bug 574570.
Removing todo's in the tests as bug 549475 has been landed.
Attachment #449617 - Attachment is obsolete: true
Blocks: 583586
Attached patch Patch v6.5Splinter Review
Yet-another-update: I've moved the changes for value sanitizing in another bug to prevent landing non-reviewed code (even if quite trivial).
Attachment #460861 - Attachment is obsolete: true
This is now in the tree:
rev: http://hg.mozilla.org/mozilla-central/rev/010d34a2c89e
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b5
Blocks: 581596
Depends on: 618876
Wouldn't it be more efficient/simple tu use à regexp for this kind of validation?

Something like that :
/^[a-zA-Z0-9\.\!\#...other special chars...\~]+@[a-z0-9\-]+(\.[a-z0-9\-]+)+$/

May work as needed and replace almost the 60 lines of IsValidEmailAddress.
I've always considered regexps as faster than algos so I had to ask you :).

I never developed for mozilla but i've seen that perl regexps are used for jsregexp and this one is perl compliant (if I'm not wrong).


Thanks in advance.
You need to log in before you can comment on or make changes to this bug.