1707842 - Function of "Match Diacritics" button for Asian languages is not clearly explained

Reporter

Description

•

3 years ago

ctrl+f 的页面查找功能，检索结果高亮显示中会区分中英文标点、全半角空格等。今天自动更新完乱了。检索英文逗号，高亮显示了中文逗号、甚至还有顿号。

When I use find in page to search for this massage, search for half-width [,] result will contain full-width comma [，] and also [、]. It may not be distinguished in English, but the functions of these two symbols in Chinese character grammar are different.

This problem can be regression to Bug 1652910. Any way, people want to be able to accurately match half-width and full-width symbols, which is very important for cjk language users.

https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms

Jonathan Kew [:jfkthame]

Comment 1

•

3 years ago

If you check the option "Match Diacritics" in the Find bar, these characters will be distinguished so that searching for [,] only matches [,] and not the other "comma" characters.

(I realize this may not be obvious -- they are related characters that can be considered a "loose" or "approximate" match, so it's similar to the case of whether "resume" should or should not match "résumé" in English, but in this case they're not actually "diacritics". Maybe there's another name for the option that would be clearer, without becoming too long and detailed...)

BugBot [:suhaib / :marco/ :calixte]

Updated

•

3 years ago

Keywords: regression

yxu

Reporter

Comment 2

•

3 years ago

Thanks, I didn't notice. But I tried to uncheck this option in firefox 87, and the symbols can match exactly. So the performance is in line with expectations, or the previous version is actually abnormal?

Alex Henrie

Comment 3

•

3 years ago

•

Edited

Yes, this is expected behavior. Before Firefox 88, searching was too strict (Bug 1649187). Now that the search algorithm is more relaxed, some variants of punctuation match each other that did not match before. This was a nice side effect because Firefox previously had a hack to make the stylized quotation marks U+201C and U+201D match the ordinary quotation mark U+0022 when "Match Case" was off, but now they match when "Match Diacritics" is off, there is no hack, and "Match Case" has nothing to do with it.

I was unaware that Chinese used U+3001 for separating items in a list as opposed to the ordinary full-width comma U+FF0C, so thank you for educating me. However, if I understand correctly from Wikipedia,[1] in documents that mix English and Japanese U+3001 is used if the comma is considered part of the Japanese text and U+FF0C is used if the comma is considered part of the English text. That difference could be subjective if the document switches back and forth between English and Japanese frequently, so it may be helpful to Japanese speakers to consider U+FF0C equivalent to U+3001 when searching.

The bottom line is that it's not clear (to me, at least) that it would be an improvement overall to make U+3001 and U+FF0C distinct in all search modes again. It would, however, be an improvement to add some more help text when hovering over the "Match Diacritics" button to explain this behavior.

[1] https://en.wikipedia.org/wiki/Comma#Languages_other_than_Western_European

Jonathan Kew [:jfkthame]

Comment 4

•

3 years ago

I wonder if there's an alternative term we could use for the option, given that it applies to more than just diacritics (although that's the largest category of what it affects). Maybe "Strict Matching"? Or invert the true/false state and call it "Loose Matching" or "Match Related Characters"? Other ideas....?

Mike, any thoughts on how we might make the UI clearer here?

Flags: needinfo?(mdeboer)

Alex Henrie

Comment 5

•

3 years ago

Attached file Bug 1707842 - Rename "Match Diacritics" to "Exact Characters" in the UI. — Details

Phabricator Automation

Updated

•

3 years ago

Assignee: nobody → alexhenrie24

Status: NEW → ASSIGNED

Phabricator Automation

Updated

•

3 years ago

Attachment #9219431 - Attachment description: WIP: Bug 1707842 - Clarify effect of "Match Diacritics" button on punctuation. → Bug 1707842 - Clarify effect of "Match Diacritics" button on punctuation.

Phabricator Automation

Updated

•

3 years ago

Attachment #9219431 - Attachment description: Bug 1707842 - Clarify effect of "Match Diacritics" button on punctuation. → Bug 1707842 - Rename "Match Diacritics" to "Exact Characters" in the UI.

Jonathan Kew [:jfkthame]

Comment 6

•

3 years ago

I'm not sure I'm entirely comfortable with "Exact Characters" for the label; that name sounds like it would encompass case as well as any other kinds of "exactness", so that it makes the "Match Case" option redundant if it's enabled.

@Alex, @flod (and anyone else with opinions....): what do you think about "Strict Matching"? That seems to me a little less categorical than "Exact", so it leaves more room conceptually for case sensitivity to still be split out as a separate option.

Flags: needinfo?(francesco.lodolo)

Flags: needinfo?(alexhenrie24)

Francesco Lodolo [:flod]

Comment 7

•

3 years ago

I don't think Strict Matching would be clear to end users: what's strict, and what does it imply? I kind of like the idea of inverting the flag, and use something like Match related/similar characters (enabled by default).

With that said, I strongly suggest to involve the Content Team (Meridel, Betsy) in this decision: they just reviewed the whole UI for MR1, and I think they're the perfect target to ask the right questions and make sure the resulting message is clear.

Flags: needinfo?(francesco.lodolo)

Jonathan Kew [:jfkthame]

Comment 8

•

3 years ago

Meridel, Betsy: as Francesco says (comment 7), we could really use some input here to help figure out the best wording/label for the Find-Bar option that is currently called "Match Diacritics"; see discussion above for more detail, and why the current label isn't ideal.

There are a few possible suggestions above, but we're not confident which would be the best choice. If you could review the discussion and offer any guidance (or ask more questions as needed, to help find the way forward...) that would be greatly appreciated. Thanks!

Flags: needinfo?(mwalkington)

Flags: needinfo?(bmikel)

Alex Henrie

Comment 9

•

3 years ago

Here are my thoughts from https://phabricator.services.mozilla.com/D113907#3703047 in case people are just looking at this bug report:

I don't like the idea of inverting the sense of the button (changing it from "Match Diacritics" to "Loose Matching") because it's right next to the "Match Case" and "Whole Words" buttons which both make the search more strict. However, I agree that the phrase "Match Diacritics" doesn't make much sense for Asian languages, where the button's most visible effect is to prevent halfwidth characters from matching fullwidth characters. And there have also been complaints about the word "diacritic" because what is considered a separate letter as opposed to a letter plus a diacritic varies from language to language.

How about we relabel the button "Exact Characters" in addition to expanding the help text? That would look nice next to "Whole Words" without being too wordy or too vague.

I know that "Exact Characters" might sound like it implies "Match Case" (though in fact they are separate features), but I still think it would be a little more clear than "Strict Matching" (which also might sound like it implies "Match Case").

Flags: needinfo?(alexhenrie24)

Summary: Characters cannot be distinguished correctly when using find in page → Function of "Match Diacritics" button for Asian languages is not clearly explained

Meridel [:meridel]

Comment 10

•

3 years ago

Can you help me understand what all is included in the "Match Diacritics" filter, beyond diacritics?
Am I correct in understanding that the issue with the current label for Asian languages is that it does not communicate that this filter allows you to sort by width of symbols? Are there other comprehension issues in addition to this?

Flags: needinfo?(mwalkington)

Alex Henrie

Comment 11

•

3 years ago

Attached file BaseChars.h — Details

The complete list of characters that are considered equivalent when "Match Diacritics" is off can be found in the file obj-*/intl/unicharutil/util/BaseChars.h after compiling Firefox. I am also attaching this file for reference. Stylized forms of various letters and punctuation marks are included in the list.
Apart from equating halfwidth and fullwidth characters, the two fullwidth commas U+3001 and U+FF0C are also considered equivalent when "Match Diacritics" is off, which is what originally led to this bug report. By the way, the button only controls search behavior, not sorting behavior.

Meridel [:meridel]

Comment 12

•

3 years ago

Thank you. I've learned that PM said string changes for Find in Page are not in scope/not a priority right now. That being said, while this is top of mind, does the "Match diacritics" filter include "Match Case"?

Flags: needinfo?(bmikel)

Jonathan Kew [:jfkthame]

Comment 13

•

3 years ago

"Match Diacritics" and "Match Case" are independent options.

So "Match Case" is only about distinguishing upper- and lower-case letters, while "Match Diacritics" covers a bunch of other "similar characters" situations, such as "straight" vs “curly” quotes, halfwidth vs fullwidth Japanese characters, Latin-script letters with accents (diacritics), and so on.

Either option can be enabled independently, or both used together for the "most exact" kind of matching.

BMO Automation

Updated

•

3 years ago

Has Regression Range: --- → yes

BugBot [:suhaib / :marco/ :calixte]

Comment 16

•

2 years ago

Redirect a needinfo that is pending on an inactive user to the triage owner.
:jfkthame, could you please find another way to get the information or close the bug as INCOMPLETE if it is not actionable?

For more information, please visit auto_nag documentation.

Flags: needinfo?(mdeboer) → needinfo?(jfkthame)

Jonathan Kew [:jfkthame]

Comment 18

•

2 years ago

I think we still need to try and find a way forward here, as the current label of the Match Diacritics option doesn't accurately reflect what it does. I guess I could accept Exact Characters as an improvement, though I'm not 100% happy with it (as discussed above).

Another possible variant (feedback welcome): Exact Matching.

Alternative suggestion: rearrange the order of the options in the Find bar, so that this option comes at the end instead of in between Match Case and Whole Words. Then it would be less confusing to invert the sense of the flag (as per comment 7), and name it something like Loose Matching or Match Similar Characters. So the Find bar (in its initial state) would be something like:

✕ ⎣Find in page_____________⎦ ∧ ∨ ☐ Highlight All ☐ Match Case ☐ Whole Words ☒ Match Similar Characters

This way the two options that make matching stricter (and are off by default) are adjacent, and then we have the option that makes it less strict (and is enabled by default).

Meridel, can you (or a colleague) help with any UX guidance here? I think the discussion from a year ago gives pretty much all the relevant background, but of course happy to try and clarify anything further if necessary.

Flags: needinfo?(jfkthame) → needinfo?(mwalkington)

Jonathan Kew [:jfkthame]

Comment 19

•

2 years ago

Also moving this from Find Backend to the Find Toolbar component, which I think is where it more properly belongs -- this isn't about the functionality but about the terminology the toolbar uses to present it.

(This doesn't alter the fact that we could use some UX guidance as to the best way forward -- but maybe others on the Toolkit team will also have helpful ideas to offer.)

Component: Find Backend → Find Toolbar

Product: Core → Toolkit

Meridel [:meridel]

Comment 20

•

2 years ago

Romain, I am not sure if you are the PM to about about this, but is this work in scope? Please advise and then I can triage.

Flags: needinfo?(mwalkington) → needinfo?(rtestard)

Romain Testard [:RT]

Comment 21

•

2 years ago

It's not currently in scope. We had aligned that we should do this as part of a refresh for the component that has to be prioritized on the product side VS other priorities. I suggest we leave this open since it's relevant but should be addressed once we can focus on the "Find in page" component refresh.

Flags: needinfo?(rtestard)

BugBot [:suhaib / :marco/ :calixte]

Comment 22

•

2 years ago

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: alexhenrie24 → nobody

Status: ASSIGNED → NEW

Neil Deakin

Updated

•

2 years ago

Severity: -- → S3

Neil Deakin

Updated

•

4 months ago

Duplicate of this bug: 1868914

Neil Deakin

Updated

•

1 month ago

Duplicate of this bug: 1893777

Bug 1707842 - Rename "Match Diacritics" to "Exact Characters" in the UI. 3 years ago Alex Henrie 48 bytes, text/x-phabricator-request		Details \| Review
BaseChars.h 3 years ago Alex Henrie 84.96 KB, text/plain		Details