Handle new localization API in Preferences Search

RESOLVED FIXED in Firefox 61

Status

()

enhancement
P3
normal
RESOLVED FIXED
a year ago
9 months ago

People

(Reporter: gandalf, Assigned: gandalf)

Tracking

(Depends on 1 bug)

unspecified
Firefox 61
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox61 fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

Spin-off from bug 1419547. Currently, in order to make Preferences searchable, we add a `searchkeywords` attribute like this:

```
<button searchkeywords="&dtd_entity1; &dtd_entity2; &dtd_entity3;">Containers</button>
```

This doesn't translate well to Fluent because it requires a localization to be placed into HTML element before it can be searched.

Instead, I'd like to brainstorm a cleaner way to handle searchability that doesn't require that much DOM impact. 

We can still store L10nIDs of localizable terms, but we would handle the terms in a map stored in memory. Like this:

```html

<button data-l10n-searchids="msg-key1 msg-key2 msg-key3">

```

and then in JS we'd do:

```js
// Oversimiplified pseudocode

let termMap = new Map();

let searchableElements = document.querySelectorAll("[data-l10n-searchids");

for (let element of searchableElements) {
  let l10nIds = element.getAttribute("data-l10n-searchids").split(' ');
  l10n terms = await document.l10n.formatValues(l10nIds);
  for (let term of terms) {
    termMap.set(term, element);
  }
}
```

This way we could build the index lazily after first-paint to not block on it, and maybe cache it (invalidate on language change/update?).

We could optimize the code to do only one `formatValues` query for all l10nIds as well.

:rchein - does it seem like a good idea to you?
:stas - do you think we could add a way to format values for searching that doesn't require variable passing and doesn't fallback?

Other ideas on how to implement it?
(Assignee)

Updated

a year ago
Blocks: 1415730
Priority: -- → P3
(Assignee)

Updated

a year ago
Flags: needinfo?(stas)
Flags: needinfo?(rchien)
I'm trying to find more information on how searchkeywords attribute works. Is it a custom thing? XUL-specific? Is it read only when the document is parsed? Does changing its value update the available search terms?

If it's dynamic, why wouldn't we make it a localizable attribute and let localizers set it?
Flags: needinfo?(stas)
In IRC, Flod and Axel didn't like the idea of letting localizers set this attribute.

Looking at https://searchfox.org/mozilla-central/source/browser/components/preferences/in-content/findInPage.js#384, we shouldn't probably create the termMap manually but rather set the attribute with localized values. Your code can be easily adapted to do it. Another option might be to add a non-localizable FTL file with:

    button-id
        .searchkeywords = { msg-key1 } { msg-key2 } { msg-key3 }
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #0)

> :stas - do you think we could add a way to format values for searching that
> doesn't require variable passing and doesn't fallback?

Can you explain why we'd need this set of restrictions on the formatting?
(Assignee)

Comment 4

a year ago
> Can you explain why we'd need this set of restrictions on the formatting?

Because in my approach proposed in comment 0, the search code has to retrieve the values of all l10n-ids needed for search terms but it doesn't pass the variables for them.

If you just use `formatValues` or `formatMessages` here, we'd attempt to fallback on the next locale if we can't format in the previous, right?
(Assignee)

Comment 5

a year ago
(In reply to Staś Małolepszy :stas from comment #2)
> In IRC, Flod and Axel didn't like the idea of letting localizers set this
> attribute.
> 
> Looking at
> https://searchfox.org/mozilla-central/source/browser/components/preferences/
> in-content/findInPage.js#384, we shouldn't probably create the termMap
> manually but rather set the attribute with localized values. Your code can
> be easily adapted to do it. Another option might be to add a non-localizable
> FTL file with:
> 
>     button-id
>         .searchkeywords = { msg-key1 } { msg-key2 } { msg-key3 }

That's the approach I used in bug 1419547 so far, but it feels weird to me that we would require to maintain separate FTL files just to be able to search through the UI. Esp. that such file would not be exposed to localizers. Feels like a one-off hack around.

I think listing l10n-ids and using search code to retrieve the terms for them seems like a cleaner approach. It also has an advantage of allowing us to do more than that.
For example, the JS code could then retrieve and index the terms both in the current localization and en-US (which is what, for example, Apple does) allowing you to find polish "Konto" by searching for "Account" which may be useful in case the person uses localized Firefox, but finds some instructions/support in English.
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #4)
> > Can you explain why we'd need this set of restrictions on the formatting?
> 
> Because in my approach proposed in comment 0, the search code has to
> retrieve the values of all l10n-ids needed for search terms but it doesn't
> pass the variables for them.
> 
> If you just use `formatValues` or `formatMessages` here, we'd attempt to
> fallback on the next locale if we can't format in the previous, right?

Sorry, perhaps I should have asked: what for? :)  Do you mean that we shouldn't fallback nor resolve variables? What if one of the messages used for search keywords requires an external argument?

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #5)
> For example, the JS code could then retrieve and index the terms both in the
> current localization and en-US (which is what, for example, Apple does)
> allowing you to find polish "Konto" by searching for "Account" which may be
> useful in case the person uses localized Firefox, but finds some
> instructions/support in English.

Funnily enough, I gave a similar example in IRC earlier today :) Microsoft does that as well and I find it very useful. That's a different bug, though.
(Assignee)

Comment 7

a year ago
> Sorry, perhaps I should have asked: what for? :)  Do you mean that we shouldn't fallback nor resolve variables? What if one of the messages used for search keywords requires an external argument?

I mean that if we try to go for the approach proposed by me in comment 0, we need a way to format messages without passing external arguments, without fallback and without erroring out on them.

I believe that this is a different API from `formatMessages` in that case.

Such an API would return plain strings like `"You have { $count } unread emails"` and let the search engine tokenize them, remove the ones starting with `$` (or everything within placeables) and add to their db.

Does it make sense?
Interesting. This goes counter to the rule that you must use format* to retrieve translations from MessageContext. You'd also need to do something about select expressions (with and without selectors). My initial reaction is to be opposed.
Counter-proposal:

Add an attribute if the .textValue and/or which attributes of an element should be searchable.

Then we're automagically searching in the formatted values, and I think that's good that way.

OTH, if we're going through l10n-ids, should we not go through all possible code paths, instead of just one? Aka, if my string is "one bookmark" or "{n} bookmarks", should I be able to search for "one b"?
Alternative proposal:

```
<button data-search-elements="#element1, #element2, #element2">
```

and then in JS:

```
let searchTerms = new (Weak?)Map();

let elements = document.querySelectorAll(dataSearchElements);

let keys = [];
for (let element of elements) {
  keys.push(document.l10n.getAttributes(element));
}
let msgs = document.l10n.formatMessages(keys);

for (let i = 0; i < elements.length; i++) {
  let element = elements[i];
  let l10n = msgs[i];
  let terms = extractTermsFrom L10n(l10n);
  for (let term of terms) {
    searchTerms.set(term, element);
  }
}
```

So, basically search attribute becomes a query selector for elements, then we fetch their l10n-id/l10n-attrs and format them, and add the result as search terms.
I'm slowly getting a hang of this. The searchkeywords can be pretty much anything that's showing up in a UI element that's triggered from the button-or-the-like. It could even be a dialog not part of the current DOM.

The way I reconstruct the intent by now is: if you clicked on this, you might see a string that you searched for.

And that doesn't need to be in the current DOM.

Which leads me towards "all possible paths of a non-simple fluent message".
> Which leads me towards "all possible paths of a non-simple fluent message".

My understanding is that this is an edge case that can be added later if needed. The search engine will probably not want to index short words like "a", "the", "is" etc. and focus on longer, more unique terms like "privacy", "search", "container", "process" etc.

Those terms should be present in each of the variants, since the variants will usually remain constant, with just differences in wording around the number, gender etc.

I'd prefer not to slow down the migration by trying to reinvent Fluent API to allow us to retrieve all possible variants of the message, and I believe that since the current approach does not handle any of the strings that use any logic (since it only handles DTD), it's not a regression to support a resolved string only.
I was going to write exactly this, thanks Zibi.
Not sure if there's a lot of difference between a "defaultTextNodes" API and "allVariantsTextNodes" API.
> Not sure if there's a lot of difference between a "defaultTextNodes" API and "allVariantsTextNodes" API.

I think there is, but in comment 10 I'm exploring approach proposal that doesn't require any chances to the API as well.
(In reply to Axel Hecht [:Pike] from comment #14)
> Not sure if there's a lot of difference between a "defaultTextNodes" API and
> "allVariantsTextNodes" API.

There's a difference. The former always returns a single string and is what the current MessageContext.format* methods use. The latter is closer to working with the full AST. We can make it happen but I don't think we should. MessageContext has been designed as a black box and the format* methods are the consequence of this. We shouldn't give developers ways to tinker with internal translation data other than via fluent-syntax.
Jared, can you share your feedback on the proposals here?
Flags: needinfo?(jaws)
I didn't involved into implementation of search highlight. I believe Jared / Mike Conley can provide better answer.
Flags: needinfo?(rchien)
I like the idea of storing the Fluent IDs in the attribute then we can can maintain a weakmap that maps element -> search IDs, which we can reference when we are searching and come across an element that has the "data-l10n-searchids" attribute. I like that this also allows us to eventually (not required for v0) to also search English strings as well.

I am aware that we can make the search here better performing as well as handle more advanced search usages, but I'd like to handle that as part of a separate project.
Flags: needinfo?(jaws)
Comment hidden (mozreview-request)
I finally took a stab at this since we're down to 3 files in preferences.xul that use DTD and two of them heavily use DTD for searchkeywords.

This patch adds a new attribute - `search-l10n-ids` which allow the user to list l10n-ids that will match this element on search.

I believe it'll fully replace `searchkeywords` and I'd like to remove the latter once I migrate all uses of it to `search-l10n-ids`.

it fits nicely into the ideas we circulated here as it can easily be extended to handle another language and can react to language changes etc.

There's one open design decision to be made, which is more of a stylistic/performance choice and since it doesn't affect the API I'd like to leave it for when I'll migrate the uses and can measure perf impact:

In this approach I perform a single `document.l10n.formatMessages` per element with `search-l10n-ids`. In theory, I could also collect all elements with this attribute and perform a single formatMessages on all of them at once.

This would reduce the number of async calls of formatMessages from ~10 (the number of elements with searchkeywords) to 1, but I'm not sure if it'd be even beneficial as looking at the current code seems like we're trying to enforce more async behavior to prevent jank I believe.

I'm setting it to block bug 1419547 and bug 1446180, since I'd prefer to finish those patches only when this lands.
Assignee: nobody → gandalf
Blocks: 1419547, 1446180
No longer blocks: 1415730
Status: NEW → ASSIGNED
(Assignee)

Comment 22

a year ago
mozreview-review
Comment on attachment 8960031 [details]
Bug 1420761 - Handle new localization API in Preferences Search.

https://reviewboard.mozilla.org/r/228800/#review234486

::: browser/components/preferences/in-content/findInPage.js:317
(Diff revision 1)
>              Services.telemetry.keyedScalarAdd("preferences.search_query", this.query, 1);
>            }, 1000);
>          }
>        }
>      } else {
> -      document.getElementById("sorry-message").textContent = "";
> +      document.getElementById("sorry-message-query").textContent = "";

This is an omission from the previous patch in bug 1445084 - we cleaned up the outer message rather than the query string only.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 25

a year ago
mozreview-review
Comment on attachment 8960031 [details]
Bug 1420761 - Handle new localization API in Preferences Search.

https://reviewboard.mozilla.org/r/228800/#review234804

This looks good. I'm only marking r- because clearing review isn't obvious that this was looked at already. I'd like to look at it again after you add the tests.

::: browser/components/preferences/in-content/findInPage.js:317
(Diff revision 3)
>              Services.telemetry.keyedScalarAdd("preferences.search_query", this.query, 1);
>            }, 1000);
>          }
>        }
>      } else {
> -      document.getElementById("sorry-message").textContent = "";
> +      document.getElementById("sorry-message-query").textContent = "";

Hm... we should probably have a test for this.

::: browser/components/preferences/in-content/findInPage.js:380
(Diff revision 3)
> +      // Searching some elements, such as xul:button, buttons to open subdialogs
> +      // using l10n ids.
> +      let keywordsResult =
> +        nodeObject.hasAttribute("search-l10n-ids") &&
> +        await this.matchesSearchL10nIDs(nodeObject, searchPhrase);

We should have an automated test that covers the "search-l10n-ids" support. I can't find one for "searchkeywords" but we should have had one for that.

You can add to the tests in browser_search_within_preferences_1.js or browser_search_within_preferences_2.js.

::: browser/components/preferences/in-content/findInPage.js:481
(Diff revision 3)
> +      // The result is an array of arrays of l10n ids and optionally attribute names.
> +      //
> +      // Example: [["containers-add-button", "label"], ["user-context-personal"]]
> +      const refs = nodeObject.getAttribute("search-l10n-ids")
> +        .split(",")
> +        .map(s => s.trim().split(".")).filter(s => s[0].length > 0);

This filters out string IDs that start with a period. How likely is that going to happen?
Attachment #8960031 - Flags: review?(jaws) → review-
Comment hidden (mozreview-request)
(Assignee)

Comment 27

a year ago
mozreview-review-reply
Comment on attachment 8960031 [details]
Bug 1420761 - Handle new localization API in Preferences Search.

https://reviewboard.mozilla.org/r/228800/#review234804

> Hm... we should probably have a test for this.

Added to the empty search result test in `browser_search_within_preferences_1.js`

> We should have an automated test that covers the "search-l10n-ids" support. I can't find one for "searchkeywords" but we should have had one for that.
> 
> You can add to the tests in browser_search_within_preferences_1.js or browser_search_within_preferences_2.js.

Added.

> This filters out string IDs that start with a period. How likely is that going to happen?

Identifier in Fluent can't start with `.`. It'll just try to `formatMessage` for an empty ID and will report it as `console.warn` for `Missing translation for string ""`. https://github.com/projectfluent/fluent/blob/master/spec/fluent.ebnf#L25
Comment hidden (mozreview-request)

Comment 29

a year ago
mozreview-review
Comment on attachment 8960031 [details]
Bug 1420761 - Handle new localization API in Preferences Search.

https://reviewboard.mozilla.org/r/228800/#review234918

::: browser/components/preferences/in-content/tests/browser_search_within_preferences_2.js:99
(Diff revision 5)
> +    let searchCompletedPromise = BrowserTestUtils.waitForEvent(
> +        gBrowser.contentWindow, "PreferencesSearchCompleted", evt => evt.detail == query);
> +    EventUtils.sendString(query);
> +    await searchCompletedPromise;
> +
> +

please remove one of these blank lines
Attachment #8960031 - Flags: review?(jaws) → review+

Comment 30

a year ago
mozreview-review-reply
Comment on attachment 8960031 [details]
Bug 1420761 - Handle new localization API in Preferences Search.

https://reviewboard.mozilla.org/r/228800/#review234804

> Identifier in Fluent can't start with `.`. It'll just try to `formatMessage` for an empty ID and will report it as `console.warn` for `Missing translation for string ""`. https://github.com/projectfluent/fluent/blob/master/spec/fluent.ebnf#L25

Yes, I think this will hide potential issues. Wouldn't it be better to log the warning and have someone file the bug for the fluent error? Otherwise it may go unnoticed.
> Yes, I think this will hide potential issues. Wouldn't it be better to log the warning and have someone file the bug for the fluent error? Otherwise it may go unnoticed.

Currently, if you try to do `search-l10n-ids=".label"` it will log in browser console with `Missing translation ""`.

Do you want me to add a console.error with `l10n id must be a string`?

Also, I may need some help with the mochitest failure: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b27775e7874b564ef25ded2b89a1c9da79f907c3&selectedJob=169042787

It seems like something funky is going on and the test fails at [0] on `header-searchResults` being visible.

I tested central and if I set the relevant functions to be async it already fails [1].

As far as I can tell it doesn't hide `header-searchResults` if [2] is async, but it's a bit tricky to test because the loop is forcefully async and both functions have to be turned async at the same time since they call each other.

I verified that the event `PreferencesSearchCompleted` is called at the right time, but it seems that after in the test the query is emptied it keeps the `header-searchResults` visible for some reason.

My first blind guess is that something about thresholds, rAF and async calls causes it, but I have no idea how to debug it.

Would you be able to help me debug it. It's trivial to reproduce with the patch in [1] and launching the browser_search_within_preferences_1 mochitest.

[0] https://searchfox.org/mozilla-central/rev/877c99c523a054419ec964d4dfb3f0cadac9d497/browser/components/preferences/in-content/tests/browser_search_within_preferences_1.js#132
[1] https://pastebin.mozilla.org/9080407
[2] https://searchfox.org/mozilla-central/source/browser/components/preferences/in-content/findInPage.js#277
Flags: needinfo?(jaws)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #31)
> > Yes, I think this will hide potential issues. Wouldn't it be better to log the warning and have someone file the bug for the fluent error? Otherwise it may go unnoticed.
> 
> Currently, if you try to do `search-l10n-ids=".label"` it will log in
> browser console with `Missing translation ""`.
> 
> Do you want me to add a console.error with `l10n id must be a string`?

I don't know when we would ever have `search-l10n-ids=".label"`. This is all static and goes through the standard Mozilla code review process. My question here is that it seems this code is trying to protect against a case that in practice will never happen :)
 
> Also, I may need some help with the mochitest failure:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=b27775e7874b564ef25ded2b89a1c9da79f907c3&selectedJob=1
> 69042787
> 
> It seems like something funky is going on and the test fails at [0] on
> `header-searchResults` being visible.
> 
> I tested central and if I set the relevant functions to be async it already
> fails [1].
> 
> As far as I can tell it doesn't hide `header-searchResults` if [2] is async,
> but it's a bit tricky to test because the loop is forcefully async and both
> functions have to be turned async at the same time since they call each
> other.
> 
> I verified that the event `PreferencesSearchCompleted` is called at the
> right time, but it seems that after in the test the query is emptied it
> keeps the `header-searchResults` visible for some reason.
> 
> My first blind guess is that something about thresholds, rAF and async calls
> causes it, but I have no idea how to debug it.
> 
> Would you be able to help me debug it. It's trivial to reproduce with the
> patch in [1] and launching the browser_search_within_preferences_1 mochitest.
> 
> [0]
> https://searchfox.org/mozilla-central/rev/
> 877c99c523a054419ec964d4dfb3f0cadac9d497/browser/components/preferences/in-
> content/tests/browser_search_within_preferences_1.js#132
> [1] https://pastebin.mozilla.org/9080407
> [2]
> https://searchfox.org/mozilla-central/source/browser/components/preferences/
> in-content/findInPage.js#277

I made some changes to this code in bug 1446657. Can you rebase on top of that and see if it still reproduces? The code cleanup in my patch makes this code a little easier to follow.
Flags: needinfo?(jaws)
(Assignee)

Updated

a year ago
Depends on: 1446657
Comment hidden (mozreview-request)
(Assignee)

Updated

a year ago
Attachment #8960031 - Attachment is obsolete: true
> My question here is that it seems this code is trying to protect against a case that in practice will never happen :)

Oh! I see! No, actually the reason I filter for refs[0] being null is this:

```
<elem search-l10n-ids="
    key1,
    key2.label,
    key3,
  ">
```

In this case, there's a dangling empty match after key3 and this filter removes it. I don't need it but then trying to place the dangling `,` will result in an empty l10n id.

Lmk what would you prefer :)

> I made some changes to this code in bug 1446657. Can you rebase on top of that and see if it still reproduces? The code cleanup in my patch makes this code a little easier to follow.

Yep! Your patch fixes it locally. Pushed to try. And since I rebased MR lost your r=, so resetting r?
Okay, I see now why you were checking for refs[0] being null. This is fine with me, thanks!

Comment 36

a year ago
mozreview-review
Comment on attachment 8960432 [details]
Bug 1420761 - Handle new localization API in Preferences Search.

https://reviewboard.mozilla.org/r/229214/#review235124
Attachment #8960432 - Flags: review?(jaws) → review+

Comment 37

a year ago
Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a831178ed112
Handle new localization API in Preferences Search. r=jaws

Comment 38

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a831178ed112
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 61
You need to log in before you can comment on or make changes to this bug.