Closed Bug 1162823 Opened 4 years ago Closed 4 years ago

en-us spelling checker should not mark non Latin words as misspelled.

Categories

(Core :: Spelling checker, defect)

40 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox39 --- unaffected
firefox40 + fixed

People

(Reporter: alice0775, Assigned: ehsan)

References

Details

(Keywords: regression)

Attachments

(2 files)

[Tracking Requested - why for this release]: regression

Reported on http://forums.mozillazine.org/viewtopic.php?p=14147555#p14147555

Steps To Reproduce:
1. installed en-US nightly https://ftp.mozilla.org/pub/mozilla.org ... a-central/
2. type any non en-us languages into textarea or copy みなさん、こんにちは and paste into textarea.

Actual Results::
spelling checker mark as misspelled words

Expected Results:
it should not mark all non Latin words (korean, japanese, arab, thai, ...) as misspelled.

Regression window:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=74e3f0993a06&tochange=d1b2c31a46d0

Regressed by: Bug 1137544
Flags: needinfo?(kevin.bugzilla)
Summary: en-us spelling checker should not mark non Latin words. → en-us spelling checker should not mark non Latin words as misspelled.
Version: 36 Branch → 40 Branch
The latest Hunspell en-us dictionary is now in UTF-8.

If this causes a problem you can change it back to ISO-8859-1 but you would also need to remove the ICONV lines.

If this is required to be a persistent change than fixing the upgrade script to patch the affix file would be the best bet, an alternative would be for my scripts to just not replace the affix file but that has the potential to create problems down the road.

Ehsan: Do you have any time to figure this out?
Flags: needinfo?(ehsan)
Here is a patch to revert the en-US dictionary to use iso8859-1 instead of UTF-8.

You can use it to see if it solves the current problem.
Flags: needinfo?(kevin.bugzilla)
Comment on attachment 8603119 [details] [diff] [review]
en-US-iso88591.diff

All of files in dictionary-sources are not going to build. So this patch actually doesn't fix the problem. :)
https://mxr.mozilla.org/mozilla-central/source/extensions/spellcheck/locales/Makefile.in#5

Ehsan, also we should re-enable spell check on these changes too, yes?
https://hg.mozilla.org/mozilla-central/rev/f0935eccab87
Oops, I'll fix this.  I can't believe I mistakenly broke a test that caught this, but we should have better tests for this...
Assignee: nobody → ehsan
Flags: needinfo?(ehsan)
Attachment #8603344 - Flags: review?(ananuti)
Ekanan, I would appreciate if you could please help review this.  :-)

In case you're not familiar with the reftest framework, it is documented here: <https://developer.mozilla.org/en-US/docs/Creating_reftest-based_unit_tests>.  Essentially it loads two pages and compares the rendering to make sure that they are equal.  The reftests that I have added have one text box with non-Latin chars, focused, to ensure that we do spell checking on it, and another one with spellcheck="false".  If the spell checker marks the non-Latin chars as misspelled, the rendering won't be equal because of the squiggly red underlines...  Hope it all makes sense!
(In reply to Ekanan Ketunuti from comment #3)
> Ehsan, also we should re-enable spell check on these changes too, yes?
> https://hg.mozilla.org/mozilla-central/rev/f0935eccab87

I did that, but note that one of the test fixes in that commit is unrelated to this issue, so I left that one in.
Ehsan:  Just a heads up I noticed you took my advice and changed the affix file to ISO-8859-1.  That's fine for now but it could create complications down the road if any non-ascii words make it into the wordlist.  On my side that is not going to happen as I convert words like café to cafe but it might happen on your side.
Flags: needinfo?(ehsan)
Comment on attachment 8603344 [details] [diff] [review]
Do not treat non-Latin words as misspelled

Review of attachment 8603344 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM
Attachment #8603344 - Flags: review?(ananuti) → review+
https://hg.mozilla.org/mozilla-central/rev/a9a4f21d0daf
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
(In reply to Kevin Atkinson from comment #8)
> Ehsan:  Just a heads up I noticed you took my advice and changed the affix
> file to ISO-8859-1.  That's fine for now but it could create complications
> down the road if any non-ascii words make it into the wordlist.  On my side
> that is not going to happen as I convert words like café to cafe but it
> might happen on your side.

Yeah, that's true.  I'll try to figure out what causes this bug when the en-US is marked as UTF-8, and will try to fix it.
Flags: needinfo?(ehsan)
Ehsan do you want to file a new bug for that issue, or reopen this one? 

It seems to me the regression is fixed and comment 8 is more about a potential problem. As this stands I don't think we need to track it for 40 or 41.
Flags: needinfo?(ehsan)
See Also: → 1164263
(In reply to Liz Henry (:lizzard) from comment #13)
> Ehsan do you want to file a new bug for that issue, or reopen this one? 

Filed bug 1162823.

> It seems to me the regression is fixed and comment 8 is more about a
> potential problem.

Indeed.

> As this stands I don't think we need to track it for 40
> or 41.

Hmm, I don't agree with this part.  We should still track this for 40, to get it on the radar if the patch ends up being backed out for some reason down the line, because this would be a pretty bad regression to ship accidentally.
Flags: needinfo?(ehsan)
OK. Thanks Ehsan. That's a good point!
You need to log in before you can comment on or make changes to this bug.