Closed Bug 1347176 Opened 7 years ago Closed 7 years ago

Implement label element extraction logic of an input field for filling form

Categories

(Toolkit :: Form Manager, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: selee, Assigned: selee)

References

(Blocks 1 open bug)

Details

(Whiteboard: [form autofill:M2])

Attachments

(1 file)

There are two usage of Label and Input elements (see [1] to see the examples):
1. Use `for` attr in Label to indicate which input is for.
2. Input element is placed in Label element.

The test page [2] provides a simple way to verify if the logic can handle the case of a field has the only clue which contains the relative label text for recognizing the field name.

[1] http://codepen.io/seanlee99/pen/yMNXMN
[2] https://luke-chang.github.io/autofill-demo/heuristics/parseable_test.html
Hey MattN,

This WIP patch provides the ability to recognize the fields which contain the label text only. (see [2] in comment 0)
The regexp is a draft to verify the implementation. The relative mochitest will be included later. Thanks.
See Also: → 1339726
Comment on attachment 8847144 [details]
Bug 1347176 - Implement label element extraction logic of an input field for filling form.;

https://reviewboard.mozilla.org/r/120168/#review122266

::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:32
(Diff revision 1)
>      "country",
>      "tel",
>      "email",
>    ],
>  
> +  VALID_LABELS: {

I notice that you're missing the state regexes…

::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:33
(Diff revision 1)
>      "tel",
>      "email",
>    ],
>  
> +  VALID_LABELS: {
> +    "organization": new RegExp(

There should be comments here pointing to the original chromium code and comments beside each regexp indicating the original Chromium variable name and any changes from their Regexp should be documented.

::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:70
(Diff revision 1)
> +      "|우편.?번호",               // ko-KR
> +      "i"

Please add unit tests for multi-byte characters. Are you sure the "i" flag works properly for all locales? Do we need the "u" flag?

You should be able to use xpcshell tests instead of mochitests.

::: browser/extensions/formautofill/FormAutofillUtils.jsm:39
(Diff revision 1)
>      return fullName;
>    },
> +
> +  findLabelElement(element) {
> +    let document = element.ownerDocument;
> +    let label = document.querySelector('label[for="' + element.id + '"]');

We should use the standard DOM API of element.labels but I just realized we don't support it yet: bug 1339726. Does your version exactly match the spec for .labels with regards to things like case-sensitivity,  leading/trailing white-space, etc.?

::: browser/extensions/formautofill/FormAutofillUtils.jsm:39
(Diff revision 1)
>      return fullName;
>    },
> +
> +  findLabelElement(element) {
> +    let document = element.ownerDocument;
> +    let label = document.querySelector('label[for="' + element.id + '"]');

There can be multiple labels for an element. What does Chromium do with that? We may want to consider more than one.

::: browser/extensions/formautofill/FormAutofillUtils.jsm:44
(Diff revision 1)
> +    if (!element.parentNode) {
> +      return null;
> +    }
> +
> +    let parent = element.parentNode;
> +    if (!parent) {
> +      return null;
> +    }

The first `if` block seems redundant unless I'm missing something.
Attachment #8847144 - Flags: review?(MattN+bmo)
Whiteboard: [form autofill:M2]
Depends on: 1349489
Comment on attachment 8847144 [details]
Bug 1347176 - Implement label element extraction logic of an input field for filling form.;

https://reviewboard.mozilla.org/r/120168/#review122266

> I notice that you're missing the state regexes…

Move these codes to bug 1349490. Let's discuss this there.

> There should be comments here pointing to the original chromium code and comments beside each regexp indicating the original Chromium variable name and any changes from their Regexp should be documented.

Move these codes to bug 1349490. Let's discuss this there.

> Please add unit tests for multi-byte characters. Are you sure the "i" flag works properly for all locales? Do we need the "u" flag?
> 
> You should be able to use xpcshell tests instead of mochitests.

Move these codes to bug 1349490. Let's discuss this there.

> There can be multiple labels for an element. What does Chromium do with that? We may want to consider more than one.

Chromium will concat all label strings (Form E in [1]), so the latest "findLabelElements" function will provide all possible labels in an array. It is more handy for prediction logic to determine the field name.

[1] https://luke-chang.github.io/autofill-demo/heuristics/parseable_test.html

> We should use the standard DOM API of element.labels but I just realized we don't support it yet: bug 1339726. Does your version exactly match the spec for .labels with regards to things like case-sensitivity,  leading/trailing white-space, etc.?

Do you mean the following labels should be found even the input id in `for` attr is not exactly the same with the input?

<input type="text" id="test_label">
<label for="test_label">(A) addressLevel2:</label>
<label for="test_LABEL">(B) addressLevel2:</label>
<label for="   test_label   ">(C) addressLevel2:</label>

If so, my patch did not handle these cases correctly.
The patch contains the test case first, and I will fix them in the final patch.
Comment on attachment 8847144 [details]
Bug 1347176 - Implement label element extraction logic of an input field for filling form.;

https://reviewboard.mozilla.org/r/120168/#review122266

> Do you mean the following labels should be found even the input id in `for` attr is not exactly the same with the input?
> 
> <input type="text" id="test_label">
> <label for="test_label">(A) addressLevel2:</label>
> <label for="test_LABEL">(B) addressLevel2:</label>
> <label for="   test_label   ">(C) addressLevel2:</label>
> 
> If so, my patch did not handle these cases correctly.
> The patch contains the test case first, and I will fix them in the final patch.

I've fixed these cases in the latest patch.
Comment on attachment 8847144 [details]
Bug 1347176 - Implement label element extraction logic of an input field for filling form.;

https://reviewboard.mozilla.org/r/120168/#review122266

> I've fixed these cases in the latest patch.

No, I didn't mean that, what I said is that you should make sure your implementation matches the HTML spec behaviour for these edge cases. It seems like the spec requires an exact match so you should revert the code change and fix the test to make sure we don't match labels that vary by whitespace or case. I also checked that Chromium does an exact match.
Comment on attachment 8847144 [details]
Bug 1347176 - Implement label element extraction logic of an input field for filling form.;

https://reviewboard.mozilla.org/r/120168/#review130964

Thanks for splitting the patch btw. unfortunately I was still slow to review it because it required looking up various details in the HTML spec.

::: browser/extensions/formautofill/FormAutofillUtils.jsm:40
(Diff revision 5)
>    },
> +
> +  findLabelElements(element) {
> +    let document = element.ownerDocument;
> +    let labels = [];
> +    for (let label of document.querySelectorAll('label[for]')) {

This seems quite inefficient since we'll need to call this method for every field. If we want to land something before bug 1339726 then either we need to make sure bug 1339726 is an MVP blocker and accept the inefficiencies in the short term or change the approach to find all the labels for the whole form and do a mapping rather than one field at a time so that way we don't have to find the labels for every element separately.

::: browser/extensions/formautofill/FormAutofillUtils.jsm:41
(Diff revision 5)
> +
> +  findLabelElements(element) {
> +    let document = element.ownerDocument;
> +    let labels = [];
> +    for (let label of document.querySelectorAll('label[for]')) {
> +      if (element.id.toLowerCase() == label.getAttribute("for").trim().toLowerCase()) {

use `label.htmlFor`. You should almost always prefer the DOM property over attributes for performance and because it sometimes does extra validation/normalization.

::: browser/extensions/formautofill/FormAutofillUtils.jsm:55
(Diff revision 5)
> +    do {
> +      if (parent.tagName == "LABEL") {
> +        log.debug("Label found in input's parent.");
> +        return [parent];
> +      }
> +      parent = parent.parentNode;
> +    } while (parent);

This also seems quite inefficient to do for every element. Since we do support `HTMLLabelElement.control` I think it would be more efficient and more consistent to use it instead but really we should get bug 1339726 implemented and that bug number should be mentioned in whatever code we land in the meantime.

::: browser/extensions/formautofill/FormAutofillUtils.jsm:55
(Diff revision 5)
> +    do {
> +      if (parent.tagName == "LABEL") {
> +        log.debug("Label found in input's parent.");
> +        return [parent];
> +      }
> +      parent = parent.parentNode;
> +    } while (parent);

> If the for attribute is not specified, but the label element has a labelable element descendant, then the first such descendant in tree order is the label element's labeled control.

This doesn't exactly match the HTML spec[1] in two ways:
1. Only the first "labelable element descendant" would be labelled by an ancestor not using @for. Your algorithm labels any given `element` by a label ancestor. Checking `label.control` matches here would fix that.
2. You should only be considering ancestor labels where "the for attribute is not specified". This is a case where you would want to use hasAttribute instead of the content property since the attribute presence is actually what you care about.

It may be useful to check Chromium behaviour on these two issues but definitely we should add test cases for both of these cases.

[1] https://html.spec.whatwg.org/multipage/forms.html#labeled-control

::: browser/extensions/formautofill/FormAutofillUtils.jsm:57
(Diff revision 5)
> +    if (!parent) {
> +      return null;
> +    }
> +    do {
> +      if (parent.tagName == "LABEL") {
> +        log.debug("Label found in input's parent.");

Nit: It may not be the direct parent, it could be another ancestor.

::: browser/extensions/formautofill/test/unit/test_findLabelElements.js:37
(Diff revision 5)
> +    description: "\"for\" attribute used to indicate input by multiple labels.",
> +    document: `<label id="labelD1" for="typeD">label type D1</label>
> +               <label id="labelD2" for="typeD">label type D2</label>
> +               <label id="labelD3" for="TYPED">label type D3</label>
> +               <label id="labelD4" for="  typeD  ">label type D4</label>
> +               <input id="typeD" type="text">`,

You should also have a testcase (or modify this one) where the <input>'s ID has leading+trailing whitespace.
Attachment #8847144 - Flags: review?(MattN+bmo)
Comment on attachment 8847144 [details]
Bug 1347176 - Implement label element extraction logic of an input field for filling form.;

https://reviewboard.mozilla.org/r/120168/#review130964

> This seems quite inefficient since we'll need to call this method for every field. If we want to land something before bug 1339726 then either we need to make sure bug 1339726 is an MVP blocker and accept the inefficiencies in the short term or change the approach to find all the labels for the whole form and do a mapping rather than one field at a time so that way we don't have to find the labels for every element separately.

I use the following codes to reduce the iteration of the label elements.
`let form = element.form ? element.form : element.ownerDocument;`
`for (let label of form.querySelectorAll('label[for]')) {`

If `element` is not in a form, we still have to traverse all `label[for]`. I think it's fine because the element passed into `findLabelElements` will be in a FormLike.

> use `label.htmlFor`. You should almost always prefer the DOM property over attributes for performance and because it sometimes does extra validation/normalization.

label.htmlFor.trim().toLowerCase() is used in the patch now.

> This also seems quite inefficient to do for every element. Since we do support `HTMLLabelElement.control` I think it would be more efficient and more consistent to use it instead but really we should get bug 1339726 implemented and that bug number should be mentioned in whatever code we land in the meantime.

See below.

> > If the for attribute is not specified, but the label element has a labelable element descendant, then the first such descendant in tree order is the label element's labeled control.
> 
> This doesn't exactly match the HTML spec[1] in two ways:
> 1. Only the first "labelable element descendant" would be labelled by an ancestor not using @for. Your algorithm labels any given `element` by a label ancestor. Checking `label.control` matches here would fix that.
> 2. You should only be considering ancestor labels where "the for attribute is not specified". This is a case where you would want to use hasAttribute instead of the content property since the attribute presence is actually what you care about.
> 
> It may be useful to check Chromium behaviour on these two issues but definitely we should add test cases for both of these cases.
> 
> [1] https://html.spec.whatwg.org/multipage/forms.html#labeled-control

The final condition is like this:
`parent.tagName == "LABEL" && parent.control == element && !parent.hasAttribute("for")`

The relative test is added as well.

> You should also have a testcase (or modify this one) where the <input>'s ID has leading+trailing whitespace.

Added one more test with an input like this: `<input id="   typeE  " type="text">`
Comment on attachment 8847144 [details]
Bug 1347176 - Implement label element extraction logic of an input field for filling form.;

https://reviewboard.mozilla.org/r/120168/#review130964

> I use the following codes to reduce the iteration of the label elements.
> `let form = element.form ? element.form : element.ownerDocument;`
> `for (let label of form.querySelectorAll('label[for]')) {`
> 
> If `element` is not in a form, we still have to traverse all `label[for]`. I think it's fine because the element passed into `findLabelElements` will be in a FormLike.

That doesn't really address what I think is the main performance issue: that we're running this same querySelectorAll for every element. It also makes the code less accurate since I don't think there is any requirement that a <label> and <input> are in the same <form> and it doesn't help with form-less cases so please revert the <form> optimization. Elements in a FormLike don't necessarily have a real form. If you wanted to work with FormLike you would use the `rootElement` of the FormLike to anchor the querySelectorAll but as I said above that can exclude valid <label> unnecesarily.

> label.htmlFor.trim().toLowerCase() is used in the patch now.

As I explained in comment 10, you shouldn't be trimming or lowercasing.

> The final condition is like this:
> `parent.tagName == "LABEL" && parent.control == element && !parent.hasAttribute("for")`
> 
> The relative test is added as well.

Thanks, that should help to ensure we don't have any false positives.
Comment on attachment 8847144 [details]
Bug 1347176 - Implement label element extraction logic of an input field for filling form.;

https://reviewboard.mozilla.org/r/120168/#review130964

> See below.

It looks like you didn't include the bug number yet.
Comment on attachment 8847144 [details]
Bug 1347176 - Implement label element extraction logic of an input field for filling form.;

https://reviewboard.mozilla.org/r/120168/#review132706

::: browser/extensions/formautofill/FormAutofillUtils.jsm:37
(Diff revision 6)
>        fullName += " " + lastName;
>      }
>      return fullName;
>    },
> +
> +  findLabelElements(element) {

I still think we should do this once for the formlike instead of once per element but if John plans to implement the proper efficient solution very soon then I guess it's fine to land if you mention that bug number and mention in a comment that this is going to be more efficient after.

::: browser/extensions/formautofill/FormAutofillUtils.jsm:40
(Diff revision 6)
>    },
> +
> +  findLabelElements(element) {
> +    let form = element.form ? element.form : element.ownerDocument;
> +    let labels = [];
> +    for (let label of form.querySelectorAll('label[for]')) {

Nit: use double quotes

::: browser/extensions/formautofill/FormAutofillUtils.jsm:41
(Diff revision 6)
> +
> +  findLabelElements(element) {
> +    let form = element.form ? element.form : element.ownerDocument;
> +    let labels = [];
> +    for (let label of form.querySelectorAll('label[for]')) {
> +      if (element.id.trim().toLowerCase() == label.htmlFor.trim().toLowerCase()) {

Fix this as I mention in comment 10.

::: browser/extensions/formautofill/test/unit/test_findLabelElements.js:53
(Diff revision 6)
> +               <label id="labelE2" for="typeE  ">label type E2</label>
> +               <label id="labelE3" for="  TYPEe">label type E3</label>
> +               <label id="labelE4" for="  typeE  ">label type E4</label>
> +               <input id="   typeE  " type="text">`,
> +    inputId: "   typeE  ",
> +    expectedLabelIds: ["labelE1", "labelE2", "labelE3", "labelE4"],

[] should be expected
Attachment #8847144 - Flags: review?(MattN+bmo)
Comment on attachment 8847144 [details]
Bug 1347176 - Implement label element extraction logic of an input field for filling form.;

https://reviewboard.mozilla.org/r/120168/#review132706

> I still think we should do this once for the formlike instead of once per element but if John plans to implement the proper efficient solution very soon then I guess it's fine to land if you mention that bug number and mention in a comment that this is going to be more efficient after.

The comment mentions bug 1339726 is going to improve the performance.
Comment on attachment 8847144 [details]
Bug 1347176 - Implement label element extraction logic of an input field for filling form.;

https://reviewboard.mozilla.org/r/120168/#review132856

::: browser/extensions/formautofill/test/unit/test_findLabelElements.js:72
(Diff revision 7)
> +    document: `<label id="labelG1" for="typeG">label type G1</label>
> +               <form>
> +                 <label id="labelG2" for="typeG">label type G2</label>
> +                 <input id="typeG" type="text">
> +               </form>
> +               <label id="labelG3" for="typeG">label type G3</label>`,

This test is for verifying the label elements out of a form should be found.
Comment on attachment 8847144 [details]
Bug 1347176 - Implement label element extraction logic of an input field for filling form.;

https://reviewboard.mozilla.org/r/120168/#review132874

Thanks
Attachment #8847144 - Flags: review?(MattN+bmo) → review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/a9a0bb47b38d
Implement label element extraction logic of an input field for filling form.; r=MattN
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/a9a0bb47b38d
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Depends on: 1374724
As per San-Francisco meeting with :vchen, marking this bug as qe-.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.