Closed Bug 1144254 Opened 6 years ago Closed 5 years ago

naïve is not in the en-US dictionary

Categories

(Core :: Spelling checker, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: jrmuizel, Assigned: jorgk-bmo)

References

Details

Attachments

(1 file, 2 obsolete files)

It should be
(Comment 1 is incorrect, as discussed in bug 1183512 comment 2. Reopening & tagging that comment as obsolete.)
Summary: naïve is not in the dictionary → naïve is not in the en-US dictionary
Looks like the spelling with the diaeresis has it's merit:
https://en.wiktionary.org/wiki/na%C3%AFve
Also: naïvely, naïveness, naïveté.
Ekanan can you please make this change. As I said in comment #2, "naïve" is in the British dictionary and I don't see why it shouldn't be in the US dictionary.
Status: RESOLVED → REOPENED
Flags: needinfo?(ananuti)
Resolution: INVALID → ---
OK, both OFD and M-W have this in en-US. patch coming.
Flags: needinfo?(ananuti)
Assignee: nobody → ananuti
Attachment #8702227 - Flags: review?(ehsan)
Comment on attachment 8702227 [details] [diff] [review]
Add naïve, naïvely, naïver, naïvest, naïveness and naiveness to the en-US dictionary

Unfortunately, we can't fix this bug without UTF-8 in the affix file. *sigh*

If we use UTF-8, the spellchecker will treat non-Latin words as misspelled. see bug 1162823. :(

Maybe WONTFIX?
Attachment #8702227 - Flags: review?(ehsan)
Can you please elaborate further.
> Unfortunately, we can't fix this bug without UTF-8 in the affix file.
So why don't we use UTF-8?

Is there a word missing?
> If we use UTF-8, the spellchecker will treat non-Latin words as [not?] misspelled.
I assume "naïve" classifies as non-Latin.

Looking at en-GB.dic (the one maintained by Marco A.G.Pinto), I see:

naive/YT
naiveness
naivete/Z
naivety/SM
naiveté/SM

naïve/Y
naïveness
naïvety/S
naïveté/S

So why does it work there? I'm using the GB dictionary to write this comment and it works just fine.
en-GB.aff has: SET UTF-8

The other word typically spelled with an accent is "résumé" (as in CV):
https://en.wikipedia.org/wiki/R%C3%A9sum%C3%A9
In the GB dictionary I see:
résumé/S

In fact, if you grep for á, é, í or ó in the GB dictionary, you find heaps, like:
Bogotá (https://en.wikipedia.org/wiki/Bogot%C3%A1, capital of Colombia, country in South America) or cliché (https://en.wikipedia.org/wiki/Clich%C3%A9).

With all due respect to you and Ehsan, I think we should move the maintenance of the dictionary out of Core::Spelling Checker. This is really a community effort and shouldn't involve (busy) core developers.

I think the French model is great (from bug 1229406): You request words here
http://www.dicollecte.org/dictionary.php?prj=fr
and it gets done for you.
Summary: naïve is not in the en-US dictionary → naïve is not in the en-US dictionary (and neither are many other accented terms that have Wikipedia entries, like Bogotá or cliché).
OK, perhaps comparing to en-GB is not the right thing to do. So let's compare to the add-on en-US dictionary from https://addons.mozilla.org/en-US/firefox/addon/united-states-english-spellche/.

Affix file says:
SET ISO8859-1 (so Latin with some accented characters, etc.).

Now let's look for some words:
clichéd
cliché/SM
(heaps of words with é)
They also have Bogotá.
And they have:
naive/SRTYP
naiveté/SM
naivety/MS
They don't have naïve.

Anyway, I don't see why the en-US dictionary that ships with Mozilla products should be worse than others.
Why would UTF-8 be necessary, why is ISO8859-1 not good enough?

Let's fix all the issues in bug 1235506.
Depends on: 1235506
(In reply to Jorg K (GMT+1) from comment #11)
> Why would UTF-8 be necessary, why is ISO8859-1 not good enough?

i have no idea.

you can try out the build from here http://archive.mozilla.org/pub/firefox/try-builds/ananuti@gmail.com-1c5235e686f856f58812313da6d9b1272d35757f/

pasting `naïve` into textarea, you'll see the red underline.

if substitute `ISO8859-1` by `UTF8`, the red underline will disappear. but we can't use UTF8 (bug 1162823).

feel free to investigate further, bug 1164263 is open.

> Let's fix all the issues in bug 1235506.

go for it :)
I don't see why I'd need a try run for adding one word to the dictionary.

I simply added "naïve" to the en-US.dic I already have on my system. I did so in Notepad++ on Windows and made sure the file encoding was "ANSI", which is ISO8859-1. "naïve" works just fine.

Your mistake was that you added the word and saved the file as UTF-8. You can see it in your patch. And surely, if you present a UTF-8 file to the spellchecker and pretend it's ISO8859-1, it ain't working ;-)

Conclusion: If we decide that we want it, "naïve" in all its variations can be added without a problem. As I suggested in bug 1235506, we should also add the word to the yet to be created "Mozilla knows better" file.
Comment on attachment 8702227 [details] [diff] [review]
Add naïve, naïvely, naïver, naïvest, naïveness and naiveness to the en-US dictionary

Wrong UTF-8 encoding used for the patch. Should be ISO8859-1.

In fact, the word addition is encoded in UTF-8, yet the checkin comment is in ISO8859-1:
Add na෥, na෥ly, na෥r, na෥st, na෥ness and naiveness to the en-US dictionary.
Attachment #8702227 - Flags: feedback-
Assignee: ananuti → nobody
Attachment #8702227 - Attachment is obsolete: true
Requested at SCOWL: https://github.com/kevina/wordlist/issues/139
Status: REOPENED → NEW
Depends on: 1238031
No longer depends on: 1235506
OK, expanding the current en-US.dic file and looking for "naiv" I get 10 words:

naive
naively
naiver
naivest
naivete   <-- this is really naiveté without the accent. No need to add ï there.
naivete's <-- same here.
naivety
naivety's
naiveté
naiveté's

Therefore we should add 8 words:

naïve
naïvely
naïver
naïvest
naïvety - see https://en.wikipedia.org/wiki/Naivety
naïvety's
naïveté
naïveté's

Patch coming.
Note the ANSI/windows-1252 encoding of the patch.
Attachment #8710941 - Flags: review?(ehsan)
Changing the summary back to what it was. Accented words got added in bug 1238031.
Assignee: nobody → mozilla
Status: NEW → ASSIGNED
Summary: naïve is not in the en-US dictionary (and neither are many other accented terms that have Wikipedia entries, like Bogotá or cliché). → naïve is not in the en-US dictionary
(In reply to Jorg K (GMT+1) from comment #17)
> Note the ANSI/windows-1252 encoding of the patch.
I meant to say ISO 8859-1. Same thing for the purpose of the patch.

Details: https://en.wikipedia.org/wiki/Windows-1252
This character encoding is a superset of ISO 8859-1, but differs from the IANA's ISO-8859-1 by using displayable characters rather than control characters in the 80 to 9F (hex) range.
Oops, forgot to update word count in the first line.
Attachment #8710941 - Attachment is obsolete: true
Attachment #8710941 - Flags: review?(ehsan)
Attachment #8710996 - Flags: review?(ehsan)
Attachment #8710996 - Flags: review?(ehsan) → review+
Dear Sheriff,

this patch changes three lines in the en-US dictionary. I promise, no test will fail due to this. Please combine with other patches when landing.

Thanks.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/61910fcb5817
Status: ASSIGNED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
You need to log in before you can comment on or make changes to this bug.