The default bug view has changed. See this FAQ.

[Form Autofill] Implement offline heuristics to determine the fieldname of input fields

NEW
Assigned to

Status

()

Toolkit
Form Manager
2 months ago
a day ago

People

(Reporter: lchang, Assigned: seanlee)

Tracking

(Depends on: 7 bugs, Blocks: 1 bug, {meta})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [form autofill:M2])

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

(Reporter)

Description

2 months ago
Implement basic heuristics based on regular expressions. In MVP, we target en-US only for now.
Whiteboard: [form autofill:MVP] → [form autofill:M2]
(Assignee)

Updated

a month ago
Assignee: nobody → selee
Comment hidden (mozreview-request)
(Assignee)

Updated

18 days ago
Keywords: meta
Summary: [Form Autofill] Implement basic heuristics based on regular expressions → [Form Autofill] Implement offline heuristics to determine the fieldname of input fields
(Assignee)

Comment 2

17 days ago
Hey MattN,

Since the heuristic logic is complex enough to categorize them into the couple types based on field types, this bug can be a meta bug to include the following bugs:
* Phone field
  - Implementation of how to determine a phone field. 
* Email field
  - Implementation of how to determine a email field. 
* Name field
  - Implementation of how to determine a name field relative to . 
* Address field
  - Implementation of how to determine a address field relative to country, street address, address line, state, organization, postal code, etc.
* Credit Card field
  - Implementation of how to determine a credit card relative field like card number and expiration date (year/month). However, this could be in M3.
* Heuristic utils
  - Some common logics like finding the label element belongs to an input or the logic cross a single element.
* The test fixtures for Top 12 site
  - Digest the web pages with forms in these sites and put the html part into our test as the test fixtures.

I am going to file 7 bugs based on the above list. Could you give these break-down a comment? Thank you.
Flags: needinfo?(MattN+bmo)
(Assignee)

Comment 3

17 days ago
Add some additional information.
(In reply to Sean Lee [:seanlee][:weilonge] from comment #2)
> Hey MattN,
> 
> Since the heuristic logic is complex enough to categorize them into the
> couple types based on field types, this bug can be a meta bug to include the
> following bugs:
> * Phone field
>   - Implementation of how to determine a phone field. 
> * Email field
>   - Implementation of how to determine a email field. 
> * Name field
>   - Implementation of how to determine a name field relative to . 
    relative to last name, first name, etc.
> * Address field
>   - Implementation of how to determine a address field relative to country,
> street address, address line, state, organization, postal code, etc.
> * Credit Card field
>   - Implementation of how to determine a credit card relative field like
> card number and expiration date (year/month). However, this could be in M3.
> * Heuristic utils
>   - Some common logics like finding the label element belongs to an input,
> the logic cross a single element, the section determination, etc.
> * The test fixtures for Top 12 site
>   - Digest the web pages with forms in these sites and put the html part
> into our test as the test fixtures.
  Also Shopify and WooCommerce
> 
> I am going to file 7 bugs based on the above list. Could you give these
> break-down a comment? Thank you.
(In reply to Sean Lee [:seanlee][:weilonge] from comment #3)
> Add some additional information.
> (In reply to Sean Lee [:seanlee][:weilonge] from comment #2)
> > Hey MattN,
> > 
> > Since the heuristic logic is complex enough to categorize them into the
> > couple types based on field types, this bug can be a meta bug to include the
> > following bugs:
> > * Phone field
> >   - Implementation of how to determine a phone field. 
> > * Email field
> >   - Implementation of how to determine a email field.

IMO this should be the first field type that we work on to figure out the APIs. I'm not sure if we should file bugs for every data type until we know how the implementation will look… maybe we will want more than one bug per type and for example phone can be multiple fields so we may want to break it down further to first handle the phone as one field then handle the phone as three fields, etc.

> > * Name field
> >   - Implementation of how to determine a name field relative to last name, first name, etc.
> > * Address field
> >   - Implementation of how to determine a address field relative to country,
> > street address, address line, state, organization, postal code, etc.
> > * Credit Card field
> >   - Implementation of how to determine a credit card relative field like
> > card number and expiration date (year/month). However, this could be in M3.
> > * Heuristic utils
> >   - Some common logics like finding the label element belongs to an input,
> > the logic cross a single element,

I think this should be its own bug with unit tests that gets done first.

> the section determination, etc.
> > * The test fixtures for Top 12 site
> >   - Digest the web pages with forms in these sites and put the html part
> > into our test as the test fixtures.
>   Also Shopify and WooCommerce

Good idea to add these pages to our tests.

> > 
> > I am going to file 7 bugs based on the above list. Could you give these
> > break-down a comment? Thank you.

I would file the bugs as we start working on them and know how the bugs will look rather than filing bugs in advance and then having to reorganize them later. For now I would file a bug on extracting labels and other relevant text to apply regexes for a field, and also to implement the email field since it's the simplest.
Flags: needinfo?(MattN+bmo)
(Assignee)

Updated

9 days ago
Depends on: 1347176
(Assignee)

Comment 5

9 days ago
After studying the implementation of other browsers, I file a bug to implement the first utility patch (bug 1347176) to extract the label element.
(Assignee)

Comment 6

3 days ago
Hey MattN,

Here are some thoughts worth to share after studying other browsers' implementation:
* The parseable name should be extracted in the order of a field's information:
  - label text
  - name
  - id

* Find the longest prefix in each attribute as its parseable name. That's helpful to distinguish the section and extract the real information for each field.

* Based on the above point, FormAutofillHeuristics module should provide a new method (e.g. "getFormInfo") to formalize every field in one form. That's useful to handle prefix string or any cross-field process.

* Since username and password are managed in LoginManager, the conflicts between LoginManager and FormAutofill can happen on email and name case. We need to discuss what's the expected behavior when a field is both applied markAsAutofillField and markAsLoginManagerField. IMO, if a field is marked as Login field, the effect of markAsAutofillField should be ignored even markAsAutofillField is called.

* Some sites (e.g. staples.com) use autocomplete="off" in input field, and we should respect the "off" case to disable the popup and autofilling features.

* List the control types or cases that the form autofill feature is interested in and check if it's match to predication. [1]
  - text
  - email
  - tel
  - select-one (the difficulty would be the filling part.)
  - textarea
  - number

Could you take a look and give your feedback? Thank you.

[1] https://developer.mozilla.org/zh-TW/docs/Web/HTML/Element/input#Attributes
Flags: needinfo?(MattN+bmo)
We discussed in-person and notes are at https://docs.google.com/document/d/1yqEKXtJc6b_ixPUCrT8Gr2JtuH2vzLX9sAKp4sMtDCs/edit
Flags: needinfo?(MattN+bmo)
(Assignee)

Updated

a day ago
Depends on: 1349489
(Assignee)

Updated

a day ago
Depends on: 1349490
(Assignee)

Updated

a day ago
Depends on: 1349492
(Assignee)

Updated

a day ago
Depends on: 1349493
(Assignee)

Updated

a day ago
Depends on: 1349494
(Assignee)

Updated

a day ago
Depends on: 1349495
(Assignee)

Comment 8

a day ago
The following new bugs are based on the landing plan in comment 7:
1. [Bug 1349489] Test pages of top 12 web sites in xpcshell 
2. [Bug 1347176] Includes the label extraction logic with the test pattern in different kind of label structure even multiple labels.
3. [Bug 1349490] Implement the first version of heuristic algorithm.
4. [Bug 1349492][parallel] Email field  - Implementation of how to determine a email field.
5. [Bug 1349493][parallel] Phone field  - Implementation of how to determine a phone field.
6. [Bug 1349494][parallel] Name field  - Implementation of how to determine a name field relative to last name and first name.
7. [Bug 1349495][parallel] Address field  - Implementation of how to determine an address field relative to country, street address, address line, state, organization, postal code, etc.

The item 4,5,6,7 can be implemented in parallel after the infrastructure (1,2,3) are done.
You need to log in before you can comment on or make changes to this bug.