Open Bug 1606916 Opened 4 years ago Updated 8 months ago

Figure out how to localize the interventions QueryScorer

Tracking

()

Status:

NEW

Iteration:

76.2 - Mar 23 - Apr 5

Tracking Flags:

Tracking

Status

firefox75

---

wontfix

People

(Reporter: bugzilla, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Harry Twyford

Reporter

Description

•

4 years ago

We should involve l10n to understand how to handle the matches with Fluent.

Marco Bonardo [:mak]

Updated

•

4 years ago

Priority: -- → P2

Harry Twyford

Reporter

Comment 1

•

4 years ago

Mike, did you end up meeting with the l10n team about this? iirc you mentioned a meeting last week.

Flags: needinfo?(mdeboer)

Axel Hecht [:Pike]

Comment 2

•

4 years ago

We haven't talked yet, but we should. Hopefully next week?

Here's a few topics to cover:

CJKT

These languages have scripts that use a single glyph per word, so the Levenshtein distance between "my house is broken" and "my browser is broken" is 1. Basically, we can't use Levenshtein for those scripts.

Multiple Scripts

Some languages have multiple scripts, for example Japanese (one word-based, one syllable-based), or Serbian (Latin and Cyrillic).

Impure use of Scripts

I wonder if Vietnamese use their proper script when searching or if they just use the closest normal Latin character.
https://en.wikipedia.org/wiki/Vietnamese_alphabet

Multiple Languages in Searches

The search language will often not match the UI language. I'm using the German UI, but search in English for tech terms. Russian UI is used in the former Soviet Union widely, so in the Baltics I also expect a mix of searches in local language, Russian, or English.

Mixed Language Searches

Overhearing conversations on Indian buses, folks in India freely mix English and local language. I would expect that to be the case for searches, too, and I wouldn't know how deterministic that is.

I'm not convinced that this is an exhaustive list, it's more like stuff I've picked up over time.

Drew Willcoxon :adw

Comment 3

•

4 years ago

We can at least use exact string matching for languages where edit distance isn't well defined, or where we just don't implement it. We can also recognize phrases in multiple languages/scripts for any given locale/language -- for example, recognizing both German and English phrases for German-speaking locales, or both kana and kanji phrases for Japanese, although I can see that leading to an explosion of the number of phrases we'd need to recognize. Along those lines, would it make sense to recognize English in all locales/languages?

Marco Bonardo [:mak]

Comment 4

•

4 years ago

Following-up with the results of the discussion with l10n, we can proceed having our own object that defines matching keywords per locale (or group of locales, like "en-XX"), or alternatively we can provide a global list for all the locales to better support mixed language matching. That object will be managed by us, not by localizers.
It's critical which source we pick for the keywords, Sumo may be a good source. The source could provide hints whether it's better to p group matchers by locales or just make a big blob.
The fuzzy matching algo may depend on the locale though, indeed we can use levenshtein only for western languages. If we go with a per-locale definition, then it could also define which fuzzy matcher to use among the ones available. If no fuzzy matcher is provided we will just do exact matching.
We also discussed using RemoteSettings for this object, because if some of the matchers ends up being "bogus", we wantt o be able to fix it out of the usual product update time frames. Mark provided me and Harry with some additional insight into using RemoteSettings.

If I missed or misunderstood something from the discussion, please correct me.

Marco Bonardo [:mak]

Comment 5

•

4 years ago

Another point that I forgot, the l10n team asked us to apply strings to tips and interventions using data-l10n-id in the DOM, rather than passing translated strings from the providers. This is another bug that should be filed, along with the RemoteSettings one.

Marco Bonardo [:mak]

Updated

•

4 years ago

Depends on: 1612496

Drew Willcoxon :adw

Comment 6

•

4 years ago

I'm looking into using NLP.js after Mike mentioned it to me, both for this bug and bug 1606915: https://github.com/axa-group/nlp.js

Assignee: nobody → adw

Status: NEW → ASSIGNED

Iteration: --- → 75.1 - Feb 10 - Feb 23

Marco Bonardo [:mak]

Comment 7

•

4 years ago

•

Edited

Another interesting project there, that is being used by Chrome, is https://github.com/google/cld3 (if we care about detecting language of the user typed words)

Drew Willcoxon :adw

Updated

•

4 years ago

Iteration: 75.1 - Feb 10 - Feb 23 → 75.2 - Feb 24 - Mar 8

Drew Willcoxon :adw

Comment 8

•

4 years ago

Making this a bit more general than "matching data."

Summary: Figure out how to localize the QueryScorer matching data → Figure out how to localize the QueryScorer

Drew Willcoxon :adw

Updated

•

4 years ago

Flags: needinfo?(mdeboer)

Drew Willcoxon :adw

Updated

•

4 years ago

Summary: Figure out how to localize the QueryScorer → Figure out how to localize the interventions QueryScorer

Mike de Boer [:mikedeboer]

Updated

•

4 years ago

status-firefox75: --- → wontfix

Drew Willcoxon :adw

Updated

•

4 years ago

Iteration: 75.2 - Feb 24 - Mar 8 → 76.1 - Mar 9 - Mar 22

Bridget Kaluzny

Updated

•

4 years ago

Iteration: 76.1 - Mar 9 - Mar 22 → 76.2 - Mar 23 - Apr 5

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Chris Bellini (:cbellini)

Updated

•

8 months ago

Assignee: adw → nobody

Status: ASSIGNED → NEW

Priority: P2 → P3

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Figure out how to localize the interventions QueryScorer

Categories

(Firefox :: Address Bar, enhancement, P3)

Tracking

()

People

(Reporter: bugzilla, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

CJKT

Multiple Scripts

Impure use of Scripts

Multiple Languages in Searches

Mixed Language Searches

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Comment 7

Updated

Comment 8

Updated

Updated

Updated

Updated

Updated

Updated

Updated