1392940 - [Form Autofill] Detect cc-exp-year followed by cc-exp-month

Reporter

Description

•

8 years ago

cc-exp-year fields in some sites (e.g. CDW and CostCo) can not be detected correctly, and we should find a solution for it especially for CDW and CostCo.

Sean Lee [:seanlee][:weilonge]

Reporter

Updated

•

8 years ago

Summary: Detect cc-exp-year followed by cc-exp-month → [Form Autofill] Detect cc-exp-year followed by cc-exp-month

Ray Lin[:ralin]

Assignee

Updated

•

7 years ago

Assignee: nobody → ralin

Ray Lin[:ralin]

Assignee

Comment 1

•

7 years ago

I'm making a patch try to add a additional detection to see if the field is "expiry month like" or "expiry year like" since current regex is too loose to distinguish them or birthday fields.

Status: NEW → ASSIGNED

Sean Lee [:seanlee][:weilonge]

Reporter

Comment 2

•

7 years ago

Based on the latest result on 2017/9/11, these sites have the cc-exp* issues: * CDW - "cc-exp-month" and "cc-exp-year" should be detected, but no cc-exp-year is detected actually. * CostCo&Staples - "cc-exp" is misdetected as "cc-exp-month"

Comment hidden (mozreview-request)

Ray Lin[:ralin]

Assignee

Comment 4

•

7 years ago

(In reply to Sean Lee [:seanlee][:weilonge] from comment #2) > Based on the latest result on 2017/9/11, these sites have the cc-exp* issues: > * CDW - "cc-exp-month" and "cc-exp-year" should be detected, but no > cc-exp-year is detected actually. Now should be able to detect cc-exp-year correctly after applied the WIP patch. > * CostCo&Staples - "cc-exp" is misdetected as "cc-exp-month" For Costco, I found another issue keep me away from inspecting the heuristic results as the CC-number is in an iframe, not sure that should be addressed elsewhere. Staples uses a combined expiration date "MM/YY", which currently couldn't not be detected by regex or option values, I'll see whether it's fine to fallback to "cc-exp" in any case if we can tell there's only one cc-exp-* related field.

Matthew N. [:MattN]

Updated

•

7 years ago

Priority: -- → P3

Ray Lin[:ralin]

Assignee

Comment 5

•

7 years ago

For "cc-exp-year followed by cc-exp-month", current WIP can correctly detect this case, at least in unit test. However, I had a hard time finding a pattern to match cc-exp for Costco and Staples under current parsing strategy. Plus, with the interference of birthday fields, it's more difficult to tell what the field is supposed to be. I will try to introduce some states in our scanner to help us persist some additional parsed data in order to have more details to run a final adjustment after going through the form. Though, we may need more smart check for different format of "cc-exp" for enhancement in the future, since it is nothing different but a normal text input for us if we failed to detect with regex.

Ray Lin[:ralin]

Assignee

Updated

•

7 years ago

Blocks: 1400760

Ray Lin[:ralin]

Assignee

Comment 6

•

7 years ago

After discussion with Sean today, I can successfully identify cc-exp on live Costco & Staple \o/. I'll rewrite the current ugly patch and go through the cc-exp related tests again to ensure I don't miss any case.

Comment hidden (mozreview-request)

Matthew N. [:MattN]

Comment 8

•

7 years ago

mozreview-review

Comment on attachment 8907926 [details] Bug 1392940 - Add expiration date parser to detect cc-exp family fields. https://reviewboard.mozilla.org/r/179606/#review188550 I think this is okay but it would be really good to have focused unit tests and I would like to understand where the heuristics came from. ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:304 (Diff revision 2) > + _isExpirationMonthLikely(fieldScanner) { > + const detail = fieldScanner.getFieldDetailByIndex(fieldScanner.parsingIndex); Please add a JSDoc comment to give a high-level description of what this does. ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:304 (Diff revision 2) > + _isExpirationMonthLikely(fieldScanner) { > + const detail = fieldScanner.getFieldDetailByIndex(fieldScanner.parsingIndex); How much of this is based on Chromium? It would be good to have cross-references with Chromium for things that are based on them. ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:315 (Diff revision 2) > + } > + > + const options = [...element.options]; > + const desiredValues = Array(12).fill(1).map((v, i) => v + i); > + > + // The amount of month select options should not less than 12 and bigger than "The number of month options shouldn't be less than 12 or larger than 13 including the default option." ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:325 (Diff revision 2) > + _isExpirationYearLikely(fieldScanner) { > + const detail = fieldScanner.getFieldDetailByIndex(fieldScanner.parsingIndex); Same here: Please add a JSDoc comment to give a high-level description of what this does. ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:334 (Diff revision 2) > + // A normal expiration year should contain at least three options that the values > + // start from this year. Where did we get this idea from? Is it our own or from Chromium? I wouldn't be surprised if some sites used a static year list with years in the past. I guess the code will handle finding that subarray but the comment kinda makes it seem like we require it to start at the current year. ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:458 (Diff revision 2) > + const savedIndex = fieldScanner.parsingIndex; > + const monthAndYearFieldNames = ["cc-exp-month", "cc-exp-year"]; > + const detail = fieldScanner.getFieldDetailByIndex(fieldScanner.parsingIndex); > + const element = detail.elementWeakRef.get(); > + > + // Skip the uninterested fields Nit: uninteresting ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:463 (Diff revision 2) > + // If the input type is a month picker, then assume it's cc-exp. > + if (element.type == "month") { Do we actually work on type=month? I didn't think it was an eligible type and I don't remember if the scanner only includes eligible types. Suggestions: a) We should add some unit tests for type=month, not just testing the fixtures. b) I think we really need better documentation for the heuristics engine. Both JSDoc comments and maybe higher-level stuff in documentation files like https://hg.mozilla.org/mozilla-central/rev/fd4b9c72b832 did for payments. ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:463 (Diff revision 2) > + // If the input type is a month picker, then assume it's cc-exp. > + if (element.type == "month") { Should we consider type=year for cc-exp-year?

Attachment #8907926 - Flags: review?(MattN+bmo)

Ray Lin[:ralin]

Assignee

Comment 9

•

7 years ago

mozreview-review-reply

Comment on attachment 8907926 [details] Bug 1392940 - Add expiration date parser to detect cc-exp family fields. https://reviewboard.mozilla.org/r/179606/#review188550 Thanks for the review :D It came from Chromium, but I didn't attempt to include all for now. Compare our implementation with Chromium parser, we don't have global states and that much helper functions to deal with some complex cases, and that's what we need to do better parsing. Perhaps, we should bring more ideas of scanner design from Chromium. will fix: - focused unit test - missing comments - typo > Where did we get this idea from? Is it our own or from Chromium? I wouldn't be surprised if some sites used a static year list with years in the past. I guess the code will handle finding that subarray but the comment kinda makes it seem like we require it to start at the current year. It's from Chromium. Code can handle that case, my bad didn't make the comment clear. > Should we consider type=year for cc-exp-year? I remember we don't have type=year. I'll check. > Do we actually work on type=month? I didn't think it was an eligible type and I don't remember if the scanner only includes eligible types. > > Suggestions: > a) We should add some unit tests for type=month, not just testing the fixtures. > b) I think we really need better documentation for the heuristics engine. Both JSDoc comments and maybe higher-level stuff in documentation files like https://hg.mozilla.org/mozilla-central/rev/fd4b9c72b832 did for payments. Oh, I should have added month in our eligible list. a) seems more actionable for me, and for b) I think Sean would be a right person to do higher-level documentation from scratch.

Comment hidden (mozreview-request)

Ray Lin[:ralin]

Assignee

Comment 11

•

7 years ago

In the new revision: - Add an unit test for cc-exp related fields - Add the missing JSDoc - type fixed Sean is still thinking how to start the higher-level document, per discussion this morning, I would wait for his comment in case any concerns I can't think of. Thanks

Ray Lin[:ralin]

Assignee

Comment 12

•

7 years ago

Oops... s/type/typo/ :P

Ray Lin[:ralin]

Assignee

Comment 13

•

7 years ago

Sean, as we talked about the documentation the other day, any thoughts? Thanks

Flags: needinfo?(selee)

Sean Lee [:seanlee][:weilonge]

Reporter

Comment 14

•

7 years ago

Hey MattN, May we file another bug to implement the documentation of the heuristics architecture? I would like to include the following description: 1. The usage of `FieldScanner`. This is a set of utilities to implement a parser. 2. The detail idea of each parser. * Phone parser - intertesting types: tel* fields * Address parser - intertesting types: address-line[1-3] * Credit Card Expire date parser - cc-exp* 3. The detail of regexps - How to apply each regexp to `id`, `name`, and the label strings of an input or select element. 4. API dependency - element.getAutocompleteInfo(), etc. Hey Ray, I am reviewing your latest patch. Could you take a look the above idea in the meantime? Thanks.

Flags: needinfo?(selee) → needinfo?(MattN+bmo)

Sean Lee [:seanlee][:weilonge]

Reporter

Updated

•

7 years ago

Blocks: 1404769

Sean Lee [:seanlee][:weilonge]

Reporter

Comment 15

•

7 years ago

bug 1404769 is just filed for the documentation discussion.

Sean Lee [:seanlee][:weilonge]

Reporter

Comment 16

•

7 years ago

mozreview-review

Comment on attachment 8907926 [details] Bug 1392940 - Add expiration date parser to detect cc-exp family fields. https://reviewboard.mozilla.org/r/179606/#review190380 Thanks for the patch! In our offline discussion, our basic idea is based on `for` or `while` loop, e.g. address parser. Once the pattern is matched to `cc-exp-month` and `cc-exp-year`, the field name can be updated at once. The benefit is to separate the matching and updating part, and this won't change `fieldScanner.parsingIndex` during matching. May I know your concern to implement the parser in a loop? ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:467 (Diff revision 3) > } > > return parsedFields; > }, > > + _parseExpirationDateFields(fieldScanner) { Please add JSDoc for this parser like other parsers. ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:467 (Diff revision 3) > } > > return parsedFields; > }, > > + _parseExpirationDateFields(fieldScanner) { nit: I think it's worth to include `CreditCard` in the function name, e.g. _parseCreditCardExpirationDateFields or _parseCreditCardExpDateFields ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:483 (Diff revision 3) > + if (!detail || !["cc-exp", ...monthAndYearFieldNames].includes(detail.fieldName)) { > + return false; > + } > + > + // If the input type is a month picker, then assume it's cc-exp. > + if (element.type == "month") { Does it miss an `input` element check since the comment does mention the input type? ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:492 (Diff revision 3) > + return true; > + } > + > + // Don't process the fields if expiration month and expiration year are already > + // matched by regex in correct order. > + let matchedMonthAndYearInOrder = monthAndYearFieldNames.every((expFieldName, i) => { any usage of the argument `i`? ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:498 (Diff revision 3) > + if (fieldScanner.parsingFinished) { > + return false; > + } > + > + let {fieldName} = fieldScanner.getFieldDetailByIndex(fieldScanner.parsingIndex); > + fieldScanner.parsingIndex++; IIUC, this code tries to move forward the index, and it means the temporary movement. If so, the temporary movement should be applied to `savedIndex` rather than `fieldScanner.parsingIndex`. `fieldScanner.parsingIndex` will be increased only when a field is given a confident fieldName. ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:504 (Diff revision 3) > + return fieldName == expFieldName; > + }); > + if (matchedMonthAndYearInOrder) { > + return true; > + } > + fieldScanner.parsingIndex = savedIndex; This line looks like to restore the index back when there is no pattern matched. Like I said above, we should operate the temporary index. ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:518 (Diff revision 3) > + fieldScanner.parsingIndex++; > + > + return true; > + } > + } > + fieldScanner.parsingIndex = savedIndex; see above.

Attachment #8907926 - Flags: review?(selee)

Ray Lin[:ralin]

Assignee

Comment 17

•

7 years ago

mozreview-review-reply

Comment on attachment 8907926 [details] Bug 1392940 - Add expiration date parser to detect cc-exp family fields. https://reviewboard.mozilla.org/r/179606/#review190380 I'll go through the issues later :D I try imitate and achieve what Chromium can do, I think we might have to consider the approach rather than putting all in one round parsing for a single field family, and record some states and control the cursor once needed. The distinction is like that we don't know how the fields will evolve in the first pass, we might need to walk back if we do a wrong guess for exp-year. And also another concern is considering if it's a regular parser, mostly we don't have certain parsing window for a specific token, sometimes a long traversal is required to determine or fix the (original) parsing result, so, I don't think we can benefit from the fixed length loop without stepping back.

Ray Lin[:ralin]

Assignee

Comment 18

•

7 years ago

For example, in the first pass, it's possible that the current cursor point to the field that can be both cc-exp and cc-exp-month, and verify whether the next field is cc-exp-year or not to decide the next move. If it's not cc-exp-year, should we just assume the first field is cc-exp? or we should do another shot to step back to first field and do further tests to verify if it's a real cc-exp. Apart from the this example, like in Chromium has several passes to deal with edge cases, it brings me the potential complexity if we're going to do more things, putting them in one loop might increase the difficulty of maintenance. Not sure if other concerns, but currently I would think controlling the cursor can also help to sort out the cases mixed with birthday fields.

Ray Lin[:ralin]

Assignee

Comment 19

•

7 years ago

s/and verify/so we have to/

Sean Lee [:seanlee][:weilonge]

Reporter

Comment 20

•

7 years ago

Percicely(In reply to Ray Lin[:ralin] from comment #17) > Comment on attachment 8907926 [details] > Bug 1392940 - Add expiration date parser to detect cc-exp family fields. > > https://reviewboard.mozilla.org/r/179606/#review190380 > > I'll go through the issues later :D > > I try imitate and achieve what Chromium can do, I think we might have to > consider the approach rather than putting all in one round parsing for a > single field family, and record some states and control the cursor once > needed. The distinction is like that we don't know how the fields will > evolve in the first pass, we might need to walk back if we do a wrong guess > for exp-year. And also another concern is considering if it's a regular > parser, mostly we don't have certain parsing window for a specific token, > sometimes a long traversal is required to determine or fix the (original) > parsing result, so, I don't think we can benefit from the fixed length loop > without stepping back. I am fine with the current if-condition form since the continous pattern only in two fields at most. What my concern is more about the index operation. The index pushing or poping is unnecessary if we change the fieldNames at once. (Please see my other comments) Once the pattern matching is failed, the parser can still start from `fieldScanner.parsingIndex` directly (you may need to store an index for the temporary usage) and prevent the index writing operation when the result is not clear yet.

Ray Lin[:ralin]

Assignee

Comment 21

•

7 years ago

(In reply to Sean Lee [:seanlee][:weilonge] from comment #20) > I am fine with the current if-condition form since the continous pattern > only in two fields at most. Got it, I thought you're asking why I didn't do parsing in a loop. :P > What my concern is more about the index operation. The index pushing or > poping is unnecessary if we change the fieldNames at once. (Please see my > other comments) > Once the pattern matching is failed, the parser can still start from > `fieldScanner.parsingIndex` directly (you may need to store an index for the > temporary usage) and prevent the index writing operation when the result is > not clear yet. I don't have certain preference, but the following is my considerations doing that: pros: - can early return at any point we're confident about the final result. - the fieldScanner can truly reflect current states. If we pass it into sub-tasks or functions, it's clear to every piece of code what position we are at now, and everyone can make use of whole fieldScanner without worrying about if any other variables are keeping a different cursor out there of callsite. Also it's more honest to transition, if we see fieldScanner as a FSM. cons: - unnecessary index operation.

Sean Lee [:seanlee][:weilonge]

Reporter

Comment 22

•

7 years ago

(In reply to Ray Lin[:ralin] from comment #21) > pros: > - can early return at any point we're confident about the final result. I am fine with changing the field name if you think it's confident, but my concern is about the unconfident part. > - the fieldScanner can truly reflect current states. If we pass it into > sub-tasks or functions, it's clear to every piece of code what position we > are at now, and everyone can make use of whole fieldScanner without worrying > about if any other variables are keeping a different cursor out there of > callsite. Also it's more honest to transition, if we see fieldScanner as a > FSM. If a sub-task needs the current window (it means it's not finished yet), we can pass another index for this case. If you are talking about `_isExpirationYearLikely` and `_isExpirationMonthLikely` cases, I think we can only pass `detail` (which is from `fieldScanner.getFieldDetailByIndex(fieldScanner.parsingIndex);`) rather than the whole fieldScanner. Probably there are some cases that I did not consider. Please let me know. Thanks. > > cons: > - unnecessary index operation. Since you did some `fieldScanner.parsingIndex++` in your patch, I wonder we should have an agreement that how to use it. `fieldScanner.parsingIndex` is also for determining `parsingFinished`. If we do some temporary index forwarding without restoring once the parsing failed, it probably makes the result incorrect or terminates the parsing process in a temporary situation.

Ray Lin[:ralin]

Assignee

Comment 23

•

7 years ago

(In reply to Sean Lee [:seanlee][:weilonge] from comment #22) > If a sub-task needs the current window (it means it's not finished yet), we > can pass another index for this case. > If you are talking about `_isExpirationYearLikely` and > `_isExpirationMonthLikely` cases, I think we can only pass `detail` (which > is from `fieldScanner.getFieldDetailByIndex(fieldScanner.parsingIndex);`) > rather than the whole fieldScanner. > Probably there are some cases that I did not consider. Please let me know. > Thanks. I think both are fine in current cases. Not sure, but it make more sense to me that the fieldScanner will carry more information than mere index in the future, the information can be some states across fields or even across sections that might be useful in sub-tasks. > Since you did some `fieldScanner.parsingIndex++` in your patch, I wonder we > should have an agreement that how to use it. `fieldScanner.parsingIndex` is > also for determining `parsingFinished`. If we do some temporary index > forwarding without restoring once the parsing failed, it probably makes the > result incorrect or terminates the parsing process in a temporary situation. yeh, I agree. It's dangerous if we forgot restore the index, though I feel like explicitly rewind the index somehow could be self-explanatory to represent a section of parsing failure.

Comment hidden (mozreview-request)

Ray Lin[:ralin]

Assignee

Comment 25

•

7 years ago

mozreview-review-reply

Comment on attachment 8907926 [details] Bug 1392940 - Add expiration date parser to detect cc-exp family fields. https://reviewboard.mozilla.org/r/179606/#review190380 Thanks for the review :D > Please add JSDoc for this parser like other parsers. added > nit: I think it's worth to include `CreditCard` in the function name, e.g. _parseCreditCardExpirationDateFields > or > _parseCreditCardExpDateFields fixed, should we also re-name the rest of function names for Address serial? (I think both collection types are at the same level to us) > Does it miss an `input` element check since the comment does mention the input type? We do eligible filter at the first place, perhaps we don't need another check here. I don't know, but it seems to me we're based on the same assumption in the other parsers. > any usage of the argument `i`? nope, removed

Comment hidden (mozreview-request)

Sean Lee [:seanlee][:weilonge]

Reporter

Comment 27

•

7 years ago

mozreview-review

Comment on attachment 8907926 [details] Bug 1392940 - Add expiration date parser to detect cc-exp family fields. https://reviewboard.mozilla.org/r/179606/#review191776 LGTM! Thanks for the effort. After having an offline discussion, the loop style parser (`every` or `while`) is replaced with if-condition style since there are only two types should be handled. ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:315 (Diff revision 5) > + * Return true if we observe the trait of month select in > + * the current element. > + */ > + _isExpirationMonthLikely(fieldScanner) { > + const detail = fieldScanner.getFieldDetailByIndex(fieldScanner.parsingIndex); > + const element = detail.elementWeakRef.get(); Passing `element` is enough (low coupling) unless you want to check `parsingFinished` in `_isExpirationYearLikely` and `_isExpirationMonthLikely`. Either way is fine with me. ::: browser/extensions/formautofill/FormAutofillHeuristics.jsm:346 (Diff revision 5) > + * Return true if we observe the trait of year select in > + * the current element. > + */ > + _isExpirationYearLikely(fieldScanner) { > + const detail = fieldScanner.getFieldDetailByIndex(fieldScanner.parsingIndex); > + const element = detail.elementWeakRef.get(); Same as above ::: browser/extensions/formautofill/test/unit/head.js:137 (Diff revision 5) > Assert.equal(forms.length, testPattern.expectedResult.length, "Expected form count."); > > forms.forEach((form, formIndex) => { > let formInfo = FormAutofillHeuristics.getFormInfo(form); > do_print("FieldName Prediction Results: " + formInfo.map(i => i.fieldName)); > + do_print("FieldName Expected Results: " + testPattern.expectedResult[formIndex].map(i => i.fieldName)); Remain one space only.

Attachment #8907926 - Flags: review?(selee) → review+

Ray Lin[:ralin]

Assignee

Comment 28

•

7 years ago

(In reply to Sean Lee [:seanlee][:weilonge] from comment #27) > Comment on attachment 8907926 [details] > Bug 1392940 - Add expiration date parser to detect cc-exp family fields. Thank you Sean for the review and the clear explanation during discussion. I'll update the patch as your suggestion soon later.

Ray Lin[:ralin]

Assignee

Comment 29

•

7 years ago

(In reply to Sean Lee [:seanlee][:weilonge] from comment #27) > Comment on attachment 8907926 [details] > Bug 1392940 - Add expiration date parser to detect cc-exp family fields. A quick question before pushing new patch: > Remain one space only. There I was thinking to make two array aligned on the same offset to increase readability in case there's a long list to compare with. Do you think it's fine to keep the spaces? Thanks

Flags: needinfo?(selee)

Sean Lee [:seanlee][:weilonge]

Reporter

Comment 30

•

7 years ago

(In reply to Ray Lin[:ralin] from comment #29) > (In reply to Sean Lee [:seanlee][:weilonge] from comment #27) > > Remain one space only. > There I was thinking to make two array aligned on the same offset to > increase readability in case there's a long list to compare with. Do you > think it's fine to keep the spaces? Thanks I am fine with it. Thanks.

Flags: needinfo?(selee)

Comment hidden (mozreview-request)

Matthew N. [:MattN]

Comment 32

•

7 years ago

mozreview-review

Comment on attachment 8907926 [details] Bug 1392940 - Add expiration date parser to detect cc-exp family fields. https://reviewboard.mozilla.org/r/179606/#review195186

Attachment #8907926 - Flags: review?(MattN+bmo) → review+

Comment hidden (mozreview-request)

Ray Lin[:ralin]

Assignee

Comment 34

•

7 years ago

Thank you MattN :D

Keywords: checkin-needed

Pulsebot

Comment 35

•

7 years ago

Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/autoland/rev/3c25526c7587 Add expiration date parser to detect cc-exp family fields. r=MattN,seanlee

Keywords: checkin-needed

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 36

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3c25526c7587

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox58: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla58

Matthew N. [:MattN]

Updated

•

7 years ago

Flags: needinfo?(MattN+bmo)