Find/FindAsYouType will not find text if entered with diacritics ("nikud") in Hebrew, or accented characters in other languages
Categories
(Core :: Find Backend, enhancement)
Tracking
()
Tracking | Status | |
---|---|---|
firefox73 | --- | fixed |
People
(Reporter: tomer, Assigned: alexhenrie24)
References
(Blocks 3 open bugs, )
Details
(Keywords: intl, parity-chrome)
Attachments
(2 files, 7 obsolete files)
4.32 MB,
patch
|
Details | Diff | Splinter Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
Reporter | ||
Comment 1•22 years ago
|
||
![]() |
||
Comment 2•22 years ago
|
||
Comment 3•22 years ago
|
||
Comment 4•22 years ago
|
||
Comment 5•22 years ago
|
||
Comment 6•22 years ago
|
||
Comment 7•22 years ago
|
||
Updated•22 years ago
|
Comment 8•21 years ago
|
||
Updated•21 years ago
|
Comment 9•21 years ago
|
||
Comment 10•21 years ago
|
||
Comment 11•20 years ago
|
||
Comment 12•20 years ago
|
||
Updated•19 years ago
|
Comment 13•19 years ago
|
||
Comment 14•19 years ago
|
||
Reporter | ||
Comment 15•19 years ago
|
||
Comment 16•18 years ago
|
||
Comment 17•18 years ago
|
||
Comment 18•18 years ago
|
||
Reporter | ||
Comment 19•17 years ago
|
||
Comment 20•17 years ago
|
||
Comment 21•17 years ago
|
||
Comment 22•17 years ago
|
||
Comment 23•17 years ago
|
||
Updated•17 years ago
|
Updated•17 years ago
|
Comment 25•17 years ago
|
||
Comment 26•17 years ago
|
||
Comment 27•17 years ago
|
||
Comment 28•17 years ago
|
||
Comment 29•17 years ago
|
||
Comment 30•17 years ago
|
||
Comment 31•17 years ago
|
||
Comment 32•17 years ago
|
||
Comment 38•14 years ago
|
||
Comment 40•13 years ago
|
||
Comment 43•12 years ago
|
||
Comment 44•12 years ago
|
||
Comment 45•12 years ago
|
||
Comment 46•12 years ago
|
||
Comment 48•11 years ago
|
||
Comment 49•11 years ago
|
||
Updated•11 years ago
|
Comment 50•11 years ago
|
||
Comment 51•11 years ago
|
||
Comment 52•11 years ago
|
||
Comment 53•11 years ago
|
||
Comment 54•11 years ago
|
||
Comment 55•11 years ago
|
||
Comment 56•11 years ago
|
||
Comment 57•11 years ago
|
||
Comment 58•11 years ago
|
||
Comment 59•11 years ago
|
||
Comment 60•11 years ago
|
||
Comment 61•11 years ago
|
||
Comment 62•11 years ago
|
||
Comment 64•11 years ago
|
||
Comment 65•11 years ago
|
||
Comment 66•11 years ago
|
||
Comment 67•11 years ago
|
||
Comment 68•10 years ago
|
||
Comment 69•10 years ago
|
||
Comment 70•10 years ago
|
||
Comment 71•10 years ago
|
||
Comment 72•10 years ago
|
||
Comment 73•10 years ago
|
||
Comment 74•10 years ago
|
||
Comment 75•10 years ago
|
||
Comment 76•10 years ago
|
||
Comment 77•10 years ago
|
||
Comment 78•10 years ago
|
||
Comment 79•10 years ago
|
||
Comment 80•10 years ago
|
||
Comment 81•10 years ago
|
||
Comment 82•10 years ago
|
||
Comment 83•10 years ago
|
||
Comment 84•10 years ago
|
||
Comment 85•10 years ago
|
||
Comment 86•10 years ago
|
||
Comment 87•10 years ago
|
||
Comment 88•10 years ago
|
||
Comment 89•10 years ago
|
||
Comment 91•10 years ago
|
||
Comment 92•10 years ago
|
||
Comment 93•9 years ago
|
||
Updated•9 years ago
|
Comment 98•9 years ago
|
||
Comment 99•9 years ago
|
||
Comment 100•9 years ago
|
||
Comment 101•9 years ago
|
||
Comment 102•9 years ago
|
||
Comment 103•9 years ago
|
||
Comment 104•9 years ago
|
||
Comment 105•9 years ago
|
||
Comment 107•8 years ago
|
||
Comment 108•8 years ago
|
||
Comment 109•8 years ago
|
||
Comment 110•8 years ago
|
||
Comment 111•7 years ago
|
||
![]() |
||
Updated•7 years ago
|
Updated•7 years ago
|
Updated•7 years ago
|
Comment 114•6 years ago
|
||
I'm trying to permanently switch from Chrome and I'm hitting this issue almost every time I search for something. Chrome's behavior seems very natural, I'd never realized / appreciated it was doing that until I tried to search for something in Firefox and it didn't find anything.
I think a reasonable approach forward here is to introduce a toggle for it (to be consistent with the existing options (Match Case, Whole Words...)), start with languages where the right behavior is more or less obvious (ě -> e, á -> a etc.), and handle more languages in new tickets as requested by users. You will never be able to handle every language in existence and trying to encapsulate everything in one go is how you get these 17 year old tickets.
Also, I find it funny that there are people in the Chromium's bug tracker who are complaining about the opposite and are calling for the same behavior as in Firefox.
https://bugs.chromium.org/p/chromium/issues/detail?id=71741
https://bugs.chromium.org/p/chromium/issues/detail?id=687552
Assignee | ||
Comment 117•6 years ago
|
||
Assignee | ||
Comment 118•6 years ago
|
||
I just uploaded a relatively simple patch that implements this feature using Unicode decomposition and a cache. It's a 90% solution that will not work as well in some languages as in others, but it keeps searches fast. I turned it on by default and added a new button to the find toolbar to turn it off.
For the same reason as in bug 969980 (because a complete solution would also do things like split digraphs into two characters), I put the diacritic-stripping functions in the "internal API" section.
What I've implemented is exactly what was proposed in comment #69. I realize that there has been a lot of discussion on this topic and short of implementing locale-specific rules, there is no solution that will make everyone happy. However, I believe that what I have is a reasonable solution that will help a lot more people than it annoys.
Comment 119•6 years ago
|
||
Thank you Alex!
Just to present my position from phabricator clear - I believe we should minimize our reliance on ICU. We currently use it where needed, but we're also slowly replacing pieces of it with Rust Unicode crates, and in the future I can see us replacing it completely.
If Manish says that plugging this into unicode-normalization
is easy, I'd love to see us using that crate (which is already vendored in so we carry the data), but I believe it's more important to get this feature, so if it's much easier to land it as-is, I'd just like to ask to file a follow-up bug to replace the backend to unicode-normalization
.
Assignee | ||
Comment 120•6 years ago
|
||
I wouldn't mind using Rust here, it just seems like a big pile of extra work for something that we'll eventually have to redesign anyway to support locale-specific and multi-character decompositions. So yeah, I'd rather leave the transition to Rust for a follow-up bug.
Comment 121•6 years ago
|
||
So yeah, I'd rather leave the transition to Rust for a follow-up bug.
Sounds good to me! It would be good then to use the review process here to verify that the way the API is written makes the switch to unicode-normalization
is easy.
Comment 122•6 years ago
|
||
Since you just landed bug 1590167 which migrated hyphenation to Rust. Do you have any thoughts on using ICU vs. unicode-normalize
- both vendored in, for this?
Comment 123•6 years ago
|
||
It looks like it shouldn't be very hard to use the Rust unicode-normalization crate for this, by implementing a GetNaked function based on https://docs.rs/unicode-normalization/0.1.9/unicode_normalization/char/fn.decompose_canonical.html, but it's a little more effort than using ICU, as it doesn't look like unicode-normalization currently offers an appropriate FFI interface.
So given that we're already using ICU's normalizer2 in Firefox C++ code (it's used in at least js/src/builtin/String.cpp and gfx/thebes/gfxHarfBuzzShaper.cpp), ISTM the simplest way forward is to use it here as well. We could have a followup bug to replace the use of ICU normalization APIs with the Rust crate, which will presumably involve first adding FFI functions on the Rust side.
Comment 125•6 years ago
|
||
Comment 126•6 years ago
|
||
Backed out changeset b89936db7178 (Bug 202251) for bc failures at browser_misused_characters_in_strings.js.
https://hg.mozilla.org/integration/autoland/rev/469a80cf68cabebfebe740c77fb6f488587beb03
Failure log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=279869012&repo=autoland&lineNumber=2570
Comment 127•6 years ago
|
||
Comment 128•6 years ago
|
||
bugherder |
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Comment 129•6 years ago
|
||
Thank you for working on this! Works like a charm.
Comment 130•5 years ago
|
||
on firefox 73.0.1, this is not working for me with Hebrew diacritics
e.g. on: https://www.mechon-mamre.org/i/t/t0101.htm
בְּרֵאשִׁית
is not matched when searching for בראשית
Comment 131•5 years ago
|
||
(In reply to eyal gruss (eyaler) from comment #130)
on firefox 73.0.1, this is not working for me with Hebrew diacritics
e.g. on: https://www.mechon-mamre.org/i/t/t0101.htm
בְּרֵאשִׁיתis not matched when searching for בראשית
That is bug 1611568 and it will be fixed since Firefox 75.
Comment 132•5 years ago
|
||
On Firefox 74.0, this is not working with "ı" letter. ı/I should get matched with i/İ.
"I" is described in Turkic section here https://en.wikipedia.org/wiki/Diacritic#Languages_with_letters_containing_diacritics
Assignee | ||
Comment 133•5 years ago
|
||
Ozkuslar, could you open a new bug report about that and add me to its CC list please?
Comment 134•5 years ago
|
||
(In reply to Tomer Cohen :tomer from comment #0)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4a)
Gecko/20030401
Build Identifier:Following is a link to Passover Haggadah (The jewish story for the passover
holidy), in Hebrew with inline punctuation. If you'll search for passover
("פסח"), you'll get only one result - the one on the title bar, which has no
punctuations at all, but if you'll search for the punctuations version of the
same word ("פֶּסַח"), you'll get the other results.Reproducible: Always
Steps to Reproduce:
Expected Results:
Mozilla shuld ignore punctuation characters in search queries, to make it
possible to omit them in the search.
after fixing of https://bugzilla.mozilla.org/show_bug.cgi?id=1611568 in firefox 75 it works
happy passover.
Comment 135•5 years ago
|
||
I just noticed that this feature was never documented
https://support.mozilla.org/en-US/kb/search-contents-current-page-text-or-links
Not exactly sure what's the process to get if fixed.
Comment 136•5 years ago
|
||
(In reply to Francesco Lodolo [:flod] from comment #135)
I just noticed that this feature was never documented
https://support.mozilla.org/en-US/kb/search-contents-current-page-text-or-linksNot exactly sure what's the process to get if fixed.
Thanks for flagging me on this. We'll get this documented by the end of the week.
Comment 137•5 years ago
|
||
Description
•