Multi-spellchecker: English + Russian ignores all errors in Cyrillic text
Categories
(Core :: Spelling checker, defect, P3)
Tracking
()
People
(Reporter: mentin, Assigned: dminor)
References
Details
Attachments
(2 files, 1 obsolete file)
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-release+
dmeehan
:
approval-mozilla-esr102+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-release+
dmeehan
:
approval-mozilla-esr102+
|
Details | Review |
Steps to reproduce:
- Open some web site with text box.
- Type some words in English and Russian, mixing correct and invalid words, e.g.
"correct правильный, incarrect непровильный"
(correct English and correct Russian words, followed by grammar errors in both) - Enable 'Check Spelling',
- Make sure enabled languages are 'English (United States)' and 'Russian'
Actual results:
When both English and Russian spellcheckers are enabled,
the Latin words are correctly checked with English spellchecker,
but any Cyrillic word is assumed correct and is never flagged as invalid.
I.e. with example above only the third word is flagged.
The fourth word, while being invalid one is not flagged, which is a bug.
The workaround is to turn off English spellchecker, which
flags all Latin words and incorrect Cyrillic words.
I.e. with example above the first, third and fourth word are flagged.
Expected results:
I expect to be able to turn on both English and Russian spellcheckers, and
that to flag words that are neither correct English, nor correct Russian words.
I.e. with example above I expect the third and fourth word to be flagged,
like it works in e.g. Chrome or Microsoft Word.
Comment 1•2 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Spelling checker' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 2•2 years ago
|
||
Have multiple simultaneously active spell checkers worked before?
Updated•2 years ago
|
The ability to enable multiple spell checkers is a relatively new feature - see bug 1402822, and deep history in bug 69687.
So it is a defect report for a new feature, which is not yet working in a usable way, not a regression.
Assignee | ||
Comment 4•2 years ago
|
||
I think we have a problem when spellchecking with the English dictionary and another language which uses a non-latin alphabet, this has been reported for Greek and Hebrew as well. (Bugs 1767932, Bug 1772762). I've had no problems with multilingual spellchecking with English and Spanish, and when I investigated the Greek bug, Spanish and Greek seemed to work as expected.
Comment 5•2 years ago
|
||
This seems rather bad bug. Dan, do you think you have time to take a look.
Assignee | ||
Comment 6•2 years ago
|
||
Sure, I'll have a look.
Assignee | ||
Comment 7•2 years ago
|
||
It's definitely something with the character sets, with just the en-US default dictionary installed, in the test string correct правильный, incarrect непровильный
only incarrect
is flagged as misspelled, but the Cyrillic text should show up as misspelled as well.
For some reason, the en-US dictionary is accepting these strings.
Assignee | ||
Comment 8•2 years ago
|
||
This adds a check to see if the encoded word is not empty and does not start
with the null character. Hunspell accepts C-style strings and marks the empty
string as correctly spelled. This prevents other dictionaries from detecting
misspelled strings in languages that use a different charset, which has lead
to problems when using the en-US dictionary packaged with Firefox and Greek,
Hebrew and Russian dictionaries, where misspellings are not detected in the
non-English language.
Assignee | ||
Comment 9•2 years ago
|
||
These were added in Bug 1162823, but there is no explanation given
as to why we would want to ignore non-Latin words when spellchecking
when using the en-US dictionary. Doing this makes it impossible to
spellcheck using the en-US dictionary and a non-Latin dictionary at
the same time.
Depends on D149899
Updated•2 years ago
|
Reporter | ||
Comment 10•2 years ago
|
||
As a user and reporter of this bug, I think the behavior introduced in Bug 1162823 did make some sense when user could only select a single dictionary. At that time, having all non-Latin words in mixed text highlighted as errors only made it harder to find real errors in English words, so while it is a questionable behavior, it was probably overall win for the user.
But now with support for multiple simultaneous dictionaries, this behavior is no longer needed or beneficial, thanks for making this change.
Assignee | ||
Comment 11•2 years ago
|
||
Try job with the updated test: https://treeherder.mozilla.org/jobs?repo=try&revision=85e4ec7d663bc1f1bac47ff3cfe7d786d3c79e4c
Comment 12•2 years ago
|
||
Assignee | ||
Comment 15•2 years ago
|
||
Comment 16•2 years ago
|
||
Comment 17•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/38138f07343b
https://hg.mozilla.org/mozilla-central/rev/9792b73e9e1a
Assignee | ||
Comment 18•2 years ago
|
||
Comment on attachment 9282233 [details]
Bug 1773802 - Ignore empty strings when spellchecking; r=smaug!
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Without this patch, multilingual spellchecking doesn't work properly when using the en-US dictionary and a non-latin character set.
- User impact if declined: Broken multilingual spellchecking for users who want to spellcheck English and Russian, Hebrew, Greek or other languages at the same time.
- Fix Landed on Version: 103
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Low risk, because the patch is very small and the functionality is covered by unit tests.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 19•2 years ago
|
||
Comment on attachment 9282233 [details]
Bug 1773802 - Ignore empty strings when spellchecking; r=smaug!
Beta/Release Uplift Approval Request
- User impact if declined: Broken multi-lingual spellchecking when using en-US dictionary and a language with a non-latin character set like Russian, Greek, Hebrew.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Low risk, the patch is small and is covered by automated tests.
- String changes made/needed: None
- Is Android affected?: Unknown
Assignee | ||
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 20•2 years ago
|
||
Comment on attachment 9282233 [details]
Bug 1773802 - Ignore empty strings when spellchecking; r=smaug!
Approved for 102.0.1, thanks.
Comment 21•2 years ago
|
||
Comment on attachment 9282632 [details]
Bug 1773802 - Specify charset in new test; r=smaug
Approved for 102.0.1, thanks.
Comment 22•2 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-release/rev/d1fc5f75c501
https://hg.mozilla.org/releases/mozilla-release/rev/de4b32c5baa6
Comment 23•2 years ago
|
||
Comment on attachment 9282233 [details]
Bug 1773802 - Ignore empty strings when spellchecking; r=smaug!
Approved for 102.1esr.
Comment 24•2 years ago
|
||
Comment on attachment 9282632 [details]
Bug 1773802 - Specify charset in new test; r=smaug
Approved for 102.1esr.
Comment 25•2 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-esr102/rev/a1cfd94c9e4d
https://hg.mozilla.org/releases/mozilla-esr102/rev/b891d76c265b
Comment 26•2 years ago
|
||
uplift |
FIREFOX_ESR_102_0_X_RELBRANCH :
https://hg.mozilla.org/releases/mozilla-esr102/rev/2973eeb3494275d85ecb081c5e74e0e6b490f98f
https://hg.mozilla.org/releases/mozilla-esr102/rev/a5142f82decbccb5a34bd02b355fb4c00c7d77cf
Description
•