[Form Autofill] Implement offline heuristics to determine the fieldname of input fields

NEW
Unassigned

Status

()

2 years ago
3 months ago

People

(Reporter: lchang, Unassigned)

Tracking

(Depends on: 7 bugs, Blocks: 1 bug, {meta})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [form autofill])

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
Implement basic heuristics based on regular expressions. In MVP, we target en-US only for now.
Whiteboard: [form autofill:MVP] → [form autofill:M2]
Assignee: nobody → selee
Comment hidden (mozreview-request)
Keywords: meta
Summary: [Form Autofill] Implement basic heuristics based on regular expressions → [Form Autofill] Implement offline heuristics to determine the fieldname of input fields
Hey MattN,

Since the heuristic logic is complex enough to categorize them into the couple types based on field types, this bug can be a meta bug to include the following bugs:
* Phone field
  - Implementation of how to determine a phone field. 
* Email field
  - Implementation of how to determine a email field. 
* Name field
  - Implementation of how to determine a name field relative to . 
* Address field
  - Implementation of how to determine a address field relative to country, street address, address line, state, organization, postal code, etc.
* Credit Card field
  - Implementation of how to determine a credit card relative field like card number and expiration date (year/month). However, this could be in M3.
* Heuristic utils
  - Some common logics like finding the label element belongs to an input or the logic cross a single element.
* The test fixtures for Top 12 site
  - Digest the web pages with forms in these sites and put the html part into our test as the test fixtures.

I am going to file 7 bugs based on the above list. Could you give these break-down a comment? Thank you.
Flags: needinfo?(MattN+bmo)
Add some additional information.
(In reply to Sean Lee [:seanlee][:weilonge] from comment #2)
> Hey MattN,
> 
> Since the heuristic logic is complex enough to categorize them into the
> couple types based on field types, this bug can be a meta bug to include the
> following bugs:
> * Phone field
>   - Implementation of how to determine a phone field. 
> * Email field
>   - Implementation of how to determine a email field. 
> * Name field
>   - Implementation of how to determine a name field relative to . 
    relative to last name, first name, etc.
> * Address field
>   - Implementation of how to determine a address field relative to country,
> street address, address line, state, organization, postal code, etc.
> * Credit Card field
>   - Implementation of how to determine a credit card relative field like
> card number and expiration date (year/month). However, this could be in M3.
> * Heuristic utils
>   - Some common logics like finding the label element belongs to an input,
> the logic cross a single element, the section determination, etc.
> * The test fixtures for Top 12 site
>   - Digest the web pages with forms in these sites and put the html part
> into our test as the test fixtures.
  Also Shopify and WooCommerce
> 
> I am going to file 7 bugs based on the above list. Could you give these
> break-down a comment? Thank you.
(In reply to Sean Lee [:seanlee][:weilonge] from comment #3)
> Add some additional information.
> (In reply to Sean Lee [:seanlee][:weilonge] from comment #2)
> > Hey MattN,
> > 
> > Since the heuristic logic is complex enough to categorize them into the
> > couple types based on field types, this bug can be a meta bug to include the
> > following bugs:
> > * Phone field
> >   - Implementation of how to determine a phone field. 
> > * Email field
> >   - Implementation of how to determine a email field.

IMO this should be the first field type that we work on to figure out the APIs. I'm not sure if we should file bugs for every data type until we know how the implementation will look… maybe we will want more than one bug per type and for example phone can be multiple fields so we may want to break it down further to first handle the phone as one field then handle the phone as three fields, etc.

> > * Name field
> >   - Implementation of how to determine a name field relative to last name, first name, etc.
> > * Address field
> >   - Implementation of how to determine a address field relative to country,
> > street address, address line, state, organization, postal code, etc.
> > * Credit Card field
> >   - Implementation of how to determine a credit card relative field like
> > card number and expiration date (year/month). However, this could be in M3.
> > * Heuristic utils
> >   - Some common logics like finding the label element belongs to an input,
> > the logic cross a single element,

I think this should be its own bug with unit tests that gets done first.

> the section determination, etc.
> > * The test fixtures for Top 12 site
> >   - Digest the web pages with forms in these sites and put the html part
> > into our test as the test fixtures.
>   Also Shopify and WooCommerce

Good idea to add these pages to our tests.

> > 
> > I am going to file 7 bugs based on the above list. Could you give these
> > break-down a comment? Thank you.

I would file the bugs as we start working on them and know how the bugs will look rather than filing bugs in advance and then having to reorganize them later. For now I would file a bug on extracting labels and other relevant text to apply regexes for a field, and also to implement the email field since it's the simplest.
Flags: needinfo?(MattN+bmo)
Depends on: 1347176
After studying the implementation of other browsers, I file a bug to implement the first utility patch (bug 1347176) to extract the label element.
Hey MattN,

Here are some thoughts worth to share after studying other browsers' implementation:
* The parseable name should be extracted in the order of a field's information:
  - label text
  - name
  - id

* Find the longest prefix in each attribute as its parseable name. That's helpful to distinguish the section and extract the real information for each field.

* Based on the above point, FormAutofillHeuristics module should provide a new method (e.g. "getFormInfo") to formalize every field in one form. That's useful to handle prefix string or any cross-field process.

* Since username and password are managed in LoginManager, the conflicts between LoginManager and FormAutofill can happen on email and name case. We need to discuss what's the expected behavior when a field is both applied markAsAutofillField and markAsLoginManagerField. IMO, if a field is marked as Login field, the effect of markAsAutofillField should be ignored even markAsAutofillField is called.

* Some sites (e.g. staples.com) use autocomplete="off" in input field, and we should respect the "off" case to disable the popup and autofilling features.

* List the control types or cases that the form autofill feature is interested in and check if it's match to predication. [1]
  - text
  - email
  - tel
  - select-one (the difficulty would be the filling part.)
  - textarea
  - number

Could you take a look and give your feedback? Thank you.

[1] https://developer.mozilla.org/zh-TW/docs/Web/HTML/Element/input#Attributes
Flags: needinfo?(MattN+bmo)
Depends on: 1349489
Depends on: 1349490
Depends on: 1349492
Depends on: 1349493
Depends on: 1349494
Depends on: 1349495
The following new bugs are based on the landing plan in comment 7:
1. [Bug 1349489] Test pages of top 12 web sites in xpcshell 
2. [Bug 1347176] Includes the label extraction logic with the test pattern in different kind of label structure even multiple labels.
3. [Bug 1349490] Implement the first version of heuristic algorithm.
4. [Bug 1349492][parallel] Email field  - Implementation of how to determine a email field.
5. [Bug 1349493][parallel] Phone field  - Implementation of how to determine a phone field.
6. [Bug 1349494][parallel] Name field  - Implementation of how to determine a name field relative to last name and first name.
7. [Bug 1349495][parallel] Address field  - Implementation of how to determine an address field relative to country, street address, address line, state, organization, postal code, etc.

The item 4,5,6,7 can be implemented in parallel after the infrastructure (1,2,3) are done.
The following field types will be implemented in each relative bug for milestone 2.
This means the heuristics should be able to recognize the following types of the relative fields with autocomplete attribute and regexp technique.

(In reply to Sean Lee [:seanlee][:weilonge] from comment #8)
> 4. [Bug 1349492][parallel] Email field  - Implementation of how to determine
> a email field.
* email

> 5. [Bug 1349493][parallel] Phone field  - Implementation of how to determine
> a phone field.
* tel

> 6. [Bug 1349494][parallel] Name field  - Implementation of how to determine
> a name field relative to last name and first name.
* name
* given-name
* additional-name
* family-name

> 7. [Bug 1349495][parallel] Address field  - Implementation of how to
> determine an address field relative to country, street address, address
> line, state, organization, postal code, etc.
* organization
* street-address
* address-line1
* address-line2
* address-line3
* address-level2
* address-level1
* postal-code
* country
* country-name
Depends on: 1361237
Depends on: 1368858
Depends on: 1368872
(Reporter)

Updated

2 years ago
Whiteboard: [form autofill:M2] → [form autofill]
After having an offline confirmation with Emma, `stale-bug` keyword can be removed since this is a meta bug.
Keywords: stale-bug
Component: Form Manager → Form Autofill
Moving to p3 because no activity for at least 24 weeks.
Priority: P1 → P3
Meta bugs don't need priorities or assignees.
Assignee: selee → nobody
OS: Mac OS X → All
Priority: P3 → --
Hardware: x86 → All
You need to log in before you can comment on or make changes to this bug.