Bug 1645922 Comment 9 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

So, the issue seems to be that the way we approach sections is that we just take a field, identify its section and then move on.

https://searchfox.org/mozilla-central/rev/5e6c7717255ca9638b2856c2b2058919aec1d21d/browser/extensions/formautofill/FormAutofillHeuristics.jsm#195-216

So, for a form with:

```html


  <form class="alignedLabels">
    <label>Name: <input autocomplete="name"></label>
    <label>Card Number: <input autocomplete="cc-number"></label>
    <label>Expiration month: <input autocomplete="cc-exp-month"></label>
    <label>Expiration year: <input autocomplete="cc-exp-year"></label>
    <label>CSC: <input autocomplete="cc-csc"></label>
    <p>
      <input type="submit" value="Submit">
      <button type="reset">Reset</button>
    </p>
  </form>
```

We take `name`, classify it as `address`, then go to `cc-number` and classify it as `creditCard` and then go on with other fields. All other fields end up in `carditCard` while `address` stays in `address`.

I see two ways to resolve it:

1) Allow fields to be ambiguous. I don't think it's just `name` that can be ambiguous and as we try to add more types of fields more fields may match different fields. It'll require additional logic to attach such fields to surrounding sections (and of course, doesn't resolve the scenario where a `name` is right between `address` and `cc` sections).
2) Manually one-off `name` as a field that if it remains the only `address` section field, and the form has `cc` section with no `cc-name`, gets moved into `cc` section as `cc-name`.


Chromium does handle this differently but a bit like (1) - https://source.chromium.org/chromium/chromium/src/+/master:components/autofill/core/browser/form_parsing/form_field.cc;l=100-103;bpv=0;bpt=1?originalUrl=https:%2F%2Fcs.chromium.org%2F

What they do is that they scan in order:
 - Address
 - Credit Card
 - Name

So, the field `name` gets classified by the `cc-name` matcher first in `Credit Card` loop, then since `cc-number` and `cc-exp` are also present, the whole section is returned (otherwise, they clear the markers and let the fields be classified by the Name and later added to Address).

Imho, that's the right approach that will pay off long term, but it's a non-trivial rewrite of the heuristics, and it diverges from the concept of sections. It doesn't necessarily erase them, but it does shuffle algorithms around because instead of going a `O(n)` and attaching sections, we'd be going `O(3n)` and first classifying all fields, and only then checking continuous sections.
At that point, I'm not sure if sections will still bring any value.

I believe there's little value in debugging the Card Type piece, unless we decide to fix the Name bit first either via dirty hack, or algorithmic reshuffle.
So, the issue seems to be that the way we approach sections is that we just take a field, identify its section and then move on.

https://searchfox.org/mozilla-central/rev/5e6c7717255ca9638b2856c2b2058919aec1d21d/browser/extensions/formautofill/FormAutofillHeuristics.jsm#195-216

So, for a form with:

```html


  <form class="alignedLabels">
    <label>Name: <input autocomplete="name"></label>
    <label>Card Number: <input autocomplete="cc-number"></label>
    <label>Expiration month: <input autocomplete="cc-exp-month"></label>
    <label>Expiration year: <input autocomplete="cc-exp-year"></label>
    <label>CSC: <input autocomplete="cc-csc"></label>
    <p>
      <input type="submit" value="Submit">
      <button type="reset">Reset</button>
    </p>
  </form>
```

We take `name`, classify it as `address`, then go to `cc-number` and classify it as `creditCard` and then go on with other fields. All other fields end up in `carditCard` while `name` stays in `address`.

I see two ways to resolve it:

1) Allow fields to be ambiguous. I don't think it's just `name` that can be ambiguous and as we try to add more types of fields more fields may match different fields. It'll require additional logic to attach such fields to surrounding sections (and of course, doesn't resolve the scenario where a `name` is right between `address` and `cc` sections).
2) Manually one-off `name` as a field that if it remains the only `address` section field, and the form has `cc` section with no `cc-name`, gets moved into `cc` section as `cc-name`.


Chromium does handle this differently but a bit like (1) - https://source.chromium.org/chromium/chromium/src/+/master:components/autofill/core/browser/form_parsing/form_field.cc;l=100-103;bpv=0;bpt=1?originalUrl=https:%2F%2Fcs.chromium.org%2F

What they do is that they scan in order:
 - Address
 - Credit Card
 - Name

So, the field `name` gets classified by the `cc-name` matcher first in `Credit Card` loop, then since `cc-number` and `cc-exp` are also present, the whole section is returned (otherwise, they clear the markers and let the fields be classified by the Name and later added to Address).

Imho, that's the right approach that will pay off long term, but it's a non-trivial rewrite of the heuristics, and it diverges from the concept of sections. It doesn't necessarily erase them, but it does shuffle algorithms around because instead of going a `O(n)` and attaching sections, we'd be going `O(3n)` and first classifying all fields, and only then checking continuous sections.
At that point, I'm not sure if sections will still bring any value.

I believe there's little value in debugging the Card Type piece, unless we decide to fix the Name bit first either via dirty hack, or algorithmic reshuffle.
So, the issue seems to be that the way we approach sections is that we just take a field, identify its section and then move on.

https://searchfox.org/mozilla-central/rev/5e6c7717255ca9638b2856c2b2058919aec1d21d/browser/extensions/formautofill/FormAutofillHeuristics.jsm#195-216

So, for a form with:

```html


  <form class="alignedLabels">
    <label>Name: <input autocomplete="name"></label>
    <label>Card Number: <input autocomplete="cc-number"></label>
    <label>Expiration month: <input autocomplete="cc-exp-month"></label>
    <label>Expiration year: <input autocomplete="cc-exp-year"></label>
    <label>CSC: <input autocomplete="cc-csc"></label>
    <p>
      <input type="submit" value="Submit">
      <button type="reset">Reset</button>
    </p>
  </form>
```

We take `name`, classify it as `address`, then go to `cc-number` and classify it as `creditCard` and then go on with other fields. All other fields end up in `carditCard` while `name` stays in `address`.

I see two ways to resolve it:

1) Allow fields to be ambiguous. I don't think it's just `name` that can be ambiguous and as we try to add more types of fields more fields may match different sections. It'll require additional logic to attach such fields to surrounding sections (and of course, doesn't resolve the scenario where a `name` is right between `address` and `cc` sections).
2) Manually one-off `name` as a field that if it remains the only `address` section field, and the form has `cc` section with no `cc-name`, gets moved into `cc` section as `cc-name`.


Chromium does handle this differently but a bit like (1) - https://source.chromium.org/chromium/chromium/src/+/master:components/autofill/core/browser/form_parsing/form_field.cc;l=100-103;bpv=0;bpt=1?originalUrl=https:%2F%2Fcs.chromium.org%2F

What they do is that they scan in order:
 - Address
 - Credit Card
 - Name

So, the field `name` gets classified by the `cc-name` matcher first in `Credit Card` loop, then since `cc-number` and `cc-exp` are also present, the whole section is returned (otherwise, they clear the markers and let the fields be classified by the Name and later added to Address).

Imho, that's the right approach that will pay off long term, but it's a non-trivial rewrite of the heuristics, and it diverges from the concept of sections. It doesn't necessarily erase them, but it does shuffle algorithms around because instead of going a `O(n)` and attaching sections, we'd be going `O(3n)` and first classifying all fields, and only then checking continuous sections.
At that point, I'm not sure if sections will still bring any value.

I believe there's little value in debugging the Card Type piece, unless we decide to fix the Name bit first either via dirty hack, or algorithmic reshuffle.
So, the issue seems to be that the way we approach sections is that we just take a field, identify its section and then move on.

https://searchfox.org/mozilla-central/rev/5e6c7717255ca9638b2856c2b2058919aec1d21d/browser/extensions/formautofill/FormAutofillHeuristics.jsm#195-216

So, for a form with:

```html


  <form class="alignedLabels">
    <label>Name: <input autocomplete="name"></label>
    <label>Card Number: <input autocomplete="cc-number"></label>
    <label>Expiration month: <input autocomplete="cc-exp-month"></label>
    <label>Expiration year: <input autocomplete="cc-exp-year"></label>
    <label>CSC: <input autocomplete="cc-csc"></label>
    <p>
      <input type="submit" value="Submit">
      <button type="reset">Reset</button>
    </p>
  </form>
```

We take `name`, classify it as `address`, then go to `cc-number` and classify it as `creditCard` and then go on with other fields. All other fields end up in `carditCard` while `name` stays in `address`.

I see two ways to resolve it:

1) Allow fields to be ambiguous. I don't think it's just `name` that can be ambiguous and as we try to add more types of fields more fields may match different sections. It'll require additional logic to attach such fields to surrounding sections (and of course, doesn't resolve the scenario where a `name` is right between `address` and `cc` sections).
2) Manually one-off `name` as a field that if it remains the only `address` section field, and the form has `cc` section with no `cc-name`, gets moved into `cc` section as `cc-name`.


Chromium does handle this differently but a bit like (1) - https://source.chromium.org/chromium/chromium/src/+/master:components/autofill/core/browser/form_parsing/form_field.cc;l=100-103;bpv=0;bpt=1?originalUrl=https:%2F%2Fcs.chromium.org%2F

What they do is that they scan in order:
 - Address
 - Credit Card
 - Name

So, the field `name` gets classified by the `cc-name` matcher first in `Credit Card` loop, then since `cc-number` and `cc-exp` are also present, the whole section is returned (otherwise, they clear the markers and let the fields be classified by the Name and later added to Address).

Imho, that's the right approach that will pay off long term, but it's a non-trivial rewrite of the heuristics, and it diverges from the concept of sections. It doesn't necessarily erase them, but it does shuffle algorithms around because instead of going a `O(n)` and attaching sections, we'd be going `O(3n)` and first classifying all fields, and only then checking continuous sections.
At that point, I'm not sure if sections will still bring any value.

I believe there's little value in debugging the Card Type saving part of this bug, unless we decide to fix the Name classifying bug first either via dirty hack, or algorithmic reshuffle.
So, the issue seems to be that the way we approach sections is that we just take a field, identify its section and then move on.

https://searchfox.org/mozilla-central/rev/5e6c7717255ca9638b2856c2b2058919aec1d21d/browser/extensions/formautofill/FormAutofillHeuristics.jsm#195-216

So, for a form with:

```html


  <form class="alignedLabels">
    <label>Name: <input autocomplete="name"></label>
    <label>Card Number: <input autocomplete="cc-number"></label>
    <label>Expiration month: <input autocomplete="cc-exp-month"></label>
    <label>Expiration year: <input autocomplete="cc-exp-year"></label>
    <label>CSC: <input autocomplete="cc-csc"></label>
    <p>
      <input type="submit" value="Submit">
      <button type="reset">Reset</button>
    </p>
  </form>
```

We take `name`, classify it as `address`, then go to `cc-number` and classify it as `creditCard` and then go on with other fields. All other fields end up in `carditCard` while `name` stays in `address`.

I see two ways to resolve it:

1) Allow fields to be ambiguous. I don't think it's just `name` that can be ambiguous and as we try to add more types of fields more fields may match different sections. It'll require additional logic to attach such fields to surrounding sections (and of course, doesn't resolve the scenario where a `name` is right between `address` and `cc` sections).
2) Manually one-off `name` as a field that if it remains the only `address` section field, and the form has `cc` section with no `cc-name`, gets moved into `cc` section as `cc-name`.


Chromium does handle this differently but a bit like (1) - https://source.chromium.org/chromium/chromium/src/+/master:components/autofill/core/browser/form_parsing/form_field.cc;l=100-103;bpv=0;bpt=1?originalUrl=https:%2F%2Fcs.chromium.org%2F

What they do is that they scan in order:
 - Address
 - Credit Card
 - Name

So, the field `name` gets classified by the `cc-name` matcher first in `Credit Card` loop, then since `cc-number` and `cc-exp` are also present, the whole section is returned (otherwise, they clear the markers and let the fields be classified by the Name and later added to Address).

Imho, that's the right approach that will pay off long term, but it's a non-trivial rewrite of the heuristics, and it diverges from the concept of sections. It doesn't necessarily erase them, but it does shuffle algorithms around because instead of going a `O(n)` and attaching sections, we'd be going `O(3n)` and first classifying all fields, and only then checking continuous sections.
At that point, I'm not sure if sections will still bring any value.

I believe there's little value in debugging the Card Type saving part of this bug, unless we decide to fix the Name classifying bug first either via dirty hack, or algorithmic rewrite.
[edited] Previously I assumed that our heuristics struggle with field of id `name`. Since then I verified that the name of the field on Etsy is `cc-name`, but it gets classified as `address#name`. Debugging why.

So, the issue seems to be that the way we approach sections is that we just take a field, identify its section and then move on.

https://searchfox.org/mozilla-central/rev/5e6c7717255ca9638b2856c2b2058919aec1d21d/browser/extensions/formautofill/FormAutofillHeuristics.jsm#195-216

So, on Etsy, we take the `cc-name` field and classify it as `name`, and part of `address`.

Then go to `cc-number` and classify it as `creditCard` and then go on with other fields. All other fields end up in `carditCard` while `name` stays in `address`.

I see two ways to resolve it:

1) Find a way to *not* classify `cc-name` as `name`.
2) Manually one-off `name` as a field that if it remains the only `address` section field, and the form has `cc` section with no `cc-name`, gets moved into `cc` section as `cc-name`.


Chromium does handle this differently - https://source.chromium.org/chromium/chromium/src/+/master:components/autofill/core/browser/form_parsing/form_field.cc;l=100-103;bpv=0;bpt=1?originalUrl=https:%2F%2Fcs.chromium.org%2F

What they do is that they scan in order:
 - Address
 - Credit Card
 - Name

So, the field `name` gets classified by the `cc-name` matcher first in `Credit Card` loop, then since `cc-number` and `cc-exp` are also present, the whole section is returned (otherwise, they clear the markers and let the fields be classified by the Name and later added to Address).

Imho, that's the right approach that will pay off long term, but it's a non-trivial rewrite of the heuristics, and it diverges from the concept of sections. It doesn't necessarily erase them, but it does shuffle algorithms around because instead of going a `O(n)` and attaching sections, we'd be going `O(3n)` and first classifying all fields, and only then checking continuous sections.
At that point, I'm not sure if sections will still bring any value.

I believe there's little value in debugging the Card Type saving part of this bug, unless we decide to fix the Name classifying bug first either via dirty hack, or algorithmic rewrite.

Back to Bug 1645922 Comment 9