Closed Bug 1348751 Opened 7 years ago Closed 7 years ago

[Form Autofill] A utility library for handling full name and separated names

Categories

(Toolkit :: Form Manager, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: lchang, Assigned: lchang)

References

Details

(Whiteboard: [form autofill:M2])

Attachments

(1 file)

When handling the autofill profiles, we'll need a utility library for transforming a full name into corresponding separated names according to the locale, and vice versa.
Whiteboard: [form autofill:M2]
See Also: → 530257
Hi Matt,

I encountered a problem about the encoding when writing this patch. According to bug 530257, "Cu.import" assumes all JSM files are encoded in ISO-Latin-1. When handling a Chinese or Korean name, however, I need to look up multi-char surnames in a pre-defined data structure. It causes me unable to define those data in the same JSM file. Thus, I put them in another JSON file and load them via XHR instead. I'm not sure if it's reasonable, and also I have no idea where to put the JSON file so it's placed under "content" for now.

Since we're targeting en-US only in MVP, I can simply remove the Chinese and Korean database if you think we don't need to handle those right now or we can wait until bug 530257 fixed.

Could you please take a look? Thanks.
Comment on attachment 8849495 [details]
Bug 1348751 - [Form Autofill] A utility library for handling full name and separated names,

https://reviewboard.mozilla.org/r/122282/#review127492

::: browser/extensions/formautofill/FormAutofillNameUtils.jsm:9
(Diff revision 1)
> +// Cu.import loads jsm files based on ISO-Latin-1 for now (see bug 530257).
> +// However, the references about name parts include multi-byte characters.
> +// Thus, we will use XHR to load the references instead.

Does loadSubScript have the same problem? It's synchronous so may make things a little easier and it means we can use regular JS with license and other comments which we can't use with JSON.

::: browser/extensions/formautofill/FormAutofillNameUtils.jsm:213
(Diff revision 1)
> +
> +    let xhr = Cc["@mozilla.org/xmlextras/xmlhttprequest;1"]
> +                .createInstance(Ci.nsIXMLHttpRequest);
> +    xhr.open("GET", NAME_REFERENCES);
> +    xhr.addEventListener("load", () => {
> +      let data = JSON.parse(xhr.responseText);

You can set xhr.responseType ="json" to have it parsed for you in `xhr.response`. You could also use `fetch` since it uses promises.

::: browser/extensions/formautofill/FormAutofillNameUtils.jsm:214
(Diff revision 1)
> +    let xhr = Cc["@mozilla.org/xmlextras/xmlhttprequest;1"]
> +                .createInstance(Ci.nsIXMLHttpRequest);
> +    xhr.open("GET", NAME_REFERENCES);
> +    xhr.addEventListener("load", () => {
> +      let data = JSON.parse(xhr.responseText);
> +      for(let key in data) {

Nit: missing space

::: browser/extensions/formautofill/content/nameReferences.json:2
(Diff revision 1)
> +  "NAME_PREFIXES": [
> +    "1lt", "1st", "2lt", "2nd", "3rd", "admiral", "capt", "captain", "col",
> +    "cpt", "dr", "gen", "general", "lcdr", "lt", "ltc", "ltg", "ltjg", "maj",
> +    "major", "mg", "mr", "mrs", "ms", "pastor", "prof", "rep", "reverend",
> +    "rev", "sen", "st"
> +  ],

Please put one entry per line so that interdiffs are much easier to read. I don't think we need to worry about saving space in such a file.
Comment on attachment 8849495 [details]
Bug 1348751 - [Form Autofill] A utility library for handling full name and separated names,

https://reviewboard.mozilla.org/r/122282/#review127492

> Does loadSubScript have the same problem? It's synchronous so may make things a little easier and it means we can use regular JS with license and other comments which we can't use with JSON.

`loadSubScript` works! I should have tested it. I'll update my patch accordingly.
Hi Matt,

The patch has been updated.
Comment on attachment 8849495 [details]
Bug 1348751 - [Form Autofill] A utility library for handling full name and separated names,

https://reviewboard.mozilla.org/r/122282/#review130954

Hi Luke, I looked at this patch a while ago and then wanted to talk to you about it live but haven't had a chance yet. Can you please include comments with references to where you got more of the data and algorithms from? As-is it's very hard for me to review since it requires understanding many character sets and locales that I have no knowledge of. If things are copied from somewhere then I can at least ensure that things are ported consistently and need to worry less about accuracy of the data. I also think that it may be a bit of a maintenance burden as-is since I wouldn't know where to update many of the lists from.
Attachment #8849495 - Flags: review?(MattN+bmo)
Hi Matt,

I updated my patch. Thanks.
Comment on attachment 8849495 [details]
Bug 1348751 - [Form Autofill] A utility library for handling full name and separated names,

https://reviewboard.mozilla.org/r/122282/#review135090

Sorry for the delay. This looks pretty good. I still need to look closer at the algorithm ports but figured I would give this feedback for now.

::: browser/extensions/formautofill/FormAutofillNameUtils.jsm:16
(Diff revision 6)
> +// Thus, we use |loadSubScript| to load the references instead.
> +const NAME_REFERENCES = "chrome://formautofill/content/nameReferences.js";
> +
> +this.EXPORTED_SYMBOLS = ["FormAutofillNameUtils"];
> +
> +// FormAutofillNameUtils is initially tranlated from

typo: translated

::: browser/extensions/formautofill/FormAutofillNameUtils.jsm:26
(Diff revision 6)
> +  WHITESPACE: [
> +    "\u0009", // CHARACTER TABULATION
> +    "\u000A", // LINE FEED (LF)

This should have a separate reference link since it's not from autofill_data_util.cc but really I would expect that we already have this in m-c somewhere. Did you check for an existing API (probably in DOM)? Maybe ask in #developers?

I found u_isWhitespace in ICU but I don't know if that is or can be easily exposed through XPCOM and maybe the overhead of that is worse.

::: browser/extensions/formautofill/FormAutofillNameUtils.jsm:198
(Diff revision 6)
> +    }
> +
> +    return nameParts;
> +  },
> +
> +  loadReferences() {

Nit: How about just `init`? That seems to be the common practice

::: browser/extensions/formautofill/test/unit/test_nameUtils.js:2
(Diff revision 6)
> +/**
> + * Tests FormAutofillNameUtils object.

Can you add a separate test file for `_isCJKname` and copy the test cases from https://chromium.googlesource.com/chromium/src/+/cbc7bdbd3e2ca861dc1d0bb63dfc3fcabb1f9fb9/components/autofill/core/browser/autofill_data_util_unittest.cc#30

::: browser/extensions/formautofill/test/unit/test_nameUtils.js:10
(Diff revision 6)
> +const TESTCASES = [
> +  {
> +    description: "First name + Last name",
> +    fullName: "John Doe",
> +    nameParts: {
> +      given: "John",
> +      middle: "",
> +      family: "Doe",
> +    },

It may be easiest to copy the test cases from https://chromium.googlesource.com/chromium/src/+/cbc7bdbd3e2ca861dc1d0bb63dfc3fcabb1f9fb9/components/autofill/core/browser/autofill_data_util_unittest.cc#78 and reference that file here.

::: browser/extensions/formautofill/test/unit/test_nameUtils.js:156
(Diff revision 6)
> +    do_print("Starting testcase: " + testcase.description);
> +    let name = FormAutofillNameUtils.joinNameParts(testcase.nameParts);

There are also join tests at https://chromium.googlesource.com/chromium/src/+/cbc7bdbd3e2ca861dc1d0bb63dfc3fcabb1f9fb9/components/autofill/core/browser/autofill_data_util_unittest.cc#155
Attachment #8849495 - Flags: review?(MattN+bmo)
Comment on attachment 8849495 [details]
Bug 1348751 - [Form Autofill] A utility library for handling full name and separated names,

https://reviewboard.mozilla.org/r/122282/#review135090

> This should have a separate reference link since it's not from autofill_data_util.cc but really I would expect that we already have this in m-c somewhere. Did you check for an existing API (probably in DOM)? Maybe ask in #developers?
> 
> I found u_isWhitespace in ICU but I don't know if that is or can be easily exposed through XPCOM and maybe the overhead of that is worse.

I'll check if we have an existing API with DOM team members. However, I'm not sure if our implementation should align with Chromium's list [1] because there might be a reason that they use a list of characters instead of `uscript_getScript`.

[1] https://cs.chromium.org/chromium/src/base/strings/string_util_constants.cc?l=9&rcl=b861deff77abecff11ae6a9f6946e9cc844b9817

> It may be easiest to copy the test cases from https://chromium.googlesource.com/chromium/src/+/cbc7bdbd3e2ca861dc1d0bb63dfc3fcabb1f9fb9/components/autofill/core/browser/autofill_data_util_unittest.cc#78 and reference that file here.

Yeah, that's much easier. Thanks for the reminder.

> There are also join tests at https://chromium.googlesource.com/chromium/src/+/cbc7bdbd3e2ca861dc1d0bb63dfc3fcabb1f9fb9/components/autofill/core/browser/autofill_data_util_unittest.cc#155

Since the "join" parts overlap the "split" parts, I'll merge these two parts into our TEXTCASES.
Comment on attachment 8849495 [details]
Bug 1348751 - [Form Autofill] A utility library for handling full name and separated names,

https://reviewboard.mozilla.org/r/122282/#review135090

> I'll check if we have an existing API with DOM team members. However, I'm not sure if our implementation should align with Chromium's list [1] because there might be a reason that they use a list of characters instead of `uscript_getScript`.
> 
> [1] https://cs.chromium.org/chromium/src/base/strings/string_util_constants.cc?l=9&rcl=b861deff77abecff11ae6a9f6946e9cc844b9817

Looks like there isn't an existing API in JavaScript. Many modules either have their own list or use `/\s/.test()`. I would prefer to keep leveraging Chromium's list for two reasons:

1. The algorithm could better align with Chromium's.
2. To expose an API for testing whitespaces might be a bit overkilled.

What do you think?
Comment on attachment 8849495 [details]
Bug 1348751 - [Form Autofill] A utility library for handling full name and separated names,

https://reviewboard.mozilla.org/r/122282/#review136506

::: browser/extensions/formautofill/FormAutofillNameUtils.jsm:129
(Diff revision 7)
> +    // characters (logographs), which constitute the "CJK Unified Ideographs"
> +    // block in Unicode, also referred to as Unihan. Korean names are usually
> +    // spelled out in the Korean alphabet (Hangul), although they do have a Han
> +    // equivalent as well.
> +
> +    let reCJK = new RegExp("[" + this.CJK_RANGE.join("") + "]");

Since this is used for both splitting and joining and it's non-trivial, I think it would make sense to create the regexp once and re-use it in future calls. Perhaps we should do it in `init`, otherwise we could use defineLazyGetter in `init`.

::: browser/extensions/formautofill/FormAutofillNameUtils.jsm:129
(Diff revision 7)
> +    let reCJK = new RegExp("[" + this.CJK_RANGE.join("") + "]");
> +    let previousWasCJK = false;

I think you should use the "u" flag as the second argument for proper unicode support but I'm not sure.

::: browser/extensions/formautofill/test/unit/test_nameUtils.js:10
(Diff revision 7)
> +"use strict";
> +
> +Cu.import("resource://gre/modules/Task.jsm");
> +Cu.import("resource://formautofill/FormAutofillNameUtils.jsm");
> +
> +// Test cases is initially copied from

Nit: Remove " is"
Attachment #8849495 - Flags: review?(MattN+bmo) → review+
Blocks: 1359892
Pushed by lchang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8d6b492ebb36
[Form Autofill] A utility library for handling full name and separated names, r=MattN
Thanks for the notice.
Flags: needinfo?(lchang)
Pushed by lchang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/93378ffa235a
[Form Autofill] A utility library for handling full name and separated names, r=MattN
https://hg.mozilla.org/mozilla-central/rev/93378ffa235a
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
As per San-Francisco meeting with :vchen, marking this bug as qe-.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: