Closed Bug 1144254 Opened 6 years ago Closed 5 years ago
naïve is not in the en-US dictionary
It should be
(Comment 1 is incorrect, as discussed in bug 1183512 comment 2. Reopening & tagging that comment as obsolete.)
Summary: naïve is not in the dictionary → naïve is not in the en-US dictionary
Looks like the spelling with the diaeresis has it's merit: https://en.wiktionary.org/wiki/na%C3%AFve Also: naïvely, naïveness, naïveté. Ekanan can you please make this change. As I said in comment #2, "naïve" is in the British dictionary and I don't see why it shouldn't be in the US dictionary.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
OK, both OFD and M-W have this in en-US. patch coming.
Assignee: nobody → ananuti
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c5235e686f8 Let's see if the `ï` breaks things.
Comment on attachment 8702227 [details] [diff] [review] Add naïve, naïvely, naïver, naïvest, naïveness and naiveness to the en-US dictionary Unfortunately, we can't fix this bug without UTF-8 in the affix file. *sigh* If we use UTF-8, the spellchecker will treat non-Latin words as misspelled. see bug 1162823. :( Maybe WONTFIX?
Can you please elaborate further. > Unfortunately, we can't fix this bug without UTF-8 in the affix file. So why don't we use UTF-8? Is there a word missing? > If we use UTF-8, the spellchecker will treat non-Latin words as [not?] misspelled. I assume "naïve" classifies as non-Latin. Looking at en-GB.dic (the one maintained by Marco A.G.Pinto), I see: naive/YT naiveness naivete/Z naivety/SM naiveté/SM naïve/Y naïveness naïvety/S naïveté/S So why does it work there? I'm using the GB dictionary to write this comment and it works just fine. en-GB.aff has: SET UTF-8 The other word typically spelled with an accent is "résumé" (as in CV): https://en.wikipedia.org/wiki/R%C3%A9sum%C3%A9 In the GB dictionary I see: résumé/S In fact, if you grep for á, é, í or ó in the GB dictionary, you find heaps, like: Bogotá (https://en.wikipedia.org/wiki/Bogot%C3%A1, capital of Colombia, country in South America) or cliché (https://en.wikipedia.org/wiki/Clich%C3%A9). With all due respect to you and Ehsan, I think we should move the maintenance of the dictionary out of Core::Spelling Checker. This is really a community effort and shouldn't involve (busy) core developers. I think the French model is great (from bug 1229406): You request words here http://www.dicollecte.org/dictionary.php?prj=fr and it gets done for you.
Summary: naïve is not in the en-US dictionary → naïve is not in the en-US dictionary (and neither are many other accented terms that have Wikipedia entries, like Bogotá or cliché).
OK, perhaps comparing to en-GB is not the right thing to do. So let's compare to the add-on en-US dictionary from https://addons.mozilla.org/en-US/firefox/addon/united-states-english-spellche/. Affix file says: SET ISO8859-1 (so Latin with some accented characters, etc.). Now let's look for some words: clichéd cliché/SM (heaps of words with é) They also have Bogotá. And they have: naive/SRTYP naiveté/SM naivety/MS They don't have naïve. Anyway, I don't see why the en-US dictionary that ships with Mozilla products should be worse than others.
Why would UTF-8 be necessary, why is ISO8859-1 not good enough? Let's fix all the issues in bug 1235506.
Depends on: 1235506
(In reply to Jorg K (GMT+1) from comment #11) > Why would UTF-8 be necessary, why is ISO8859-1 not good enough? i have no idea. you can try out the build from here http://firstname.lastname@example.org/ pasting `naïve` into textarea, you'll see the red underline. if substitute `ISO8859-1` by `UTF8`, the red underline will disappear. but we can't use UTF8 (bug 1162823). feel free to investigate further, bug 1164263 is open. > Let's fix all the issues in bug 1235506. go for it :)
I don't see why I'd need a try run for adding one word to the dictionary. I simply added "naïve" to the en-US.dic I already have on my system. I did so in Notepad++ on Windows and made sure the file encoding was "ANSI", which is ISO8859-1. "naïve" works just fine. Your mistake was that you added the word and saved the file as UTF-8. You can see it in your patch. And surely, if you present a UTF-8 file to the spellchecker and pretend it's ISO8859-1, it ain't working ;-) Conclusion: If we decide that we want it, "naïve" in all its variations can be added without a problem. As I suggested in bug 1235506, we should also add the word to the yet to be created "Mozilla knows better" file.
Comment on attachment 8702227 [details] [diff] [review] Add naïve, naïvely, naïver, naïvest, naïveness and naiveness to the en-US dictionary Wrong UTF-8 encoding used for the patch. Should be ISO8859-1. In fact, the word addition is encoded in UTF-8, yet the checkin comment is in ISO8859-1: Add na, naly, nar, nast, naness and naiveness to the en-US dictionary.
Attachment #8702227 - Flags: feedback-
Requested at SCOWL: https://github.com/kevina/wordlist/issues/139
Status: REOPENED → NEW
OK, expanding the current en-US.dic file and looking for "naiv" I get 10 words: naive naively naiver naivest naivete <-- this is really naiveté without the accent. No need to add ï there. naivete's <-- same here. naivety naivety's naiveté naiveté's Therefore we should add 8 words: naïve naïvely naïver naïvest naïvety - see https://en.wikipedia.org/wiki/Naivety naïvety's naïveté naïveté's Patch coming.
Note the ANSI/windows-1252 encoding of the patch.
Changing the summary back to what it was. Accented words got added in bug 1238031.
Assignee: nobody → mozilla
Status: NEW → ASSIGNED
Summary: naïve is not in the en-US dictionary (and neither are many other accented terms that have Wikipedia entries, like Bogotá or cliché). → naïve is not in the en-US dictionary
(In reply to Jorg K (GMT+1) from comment #17) > Note the ANSI/windows-1252 encoding of the patch. I meant to say ISO 8859-1. Same thing for the purpose of the patch. Details: https://en.wikipedia.org/wiki/Windows-1252 This character encoding is a superset of ISO 8859-1, but differs from the IANA's ISO-8859-1 by using displayable characters rather than control characters in the 80 to 9F (hex) range.
Oops, forgot to update word count in the first line.
Dear Sheriff, this patch changes three lines in the en-US dictionary. I promise, no test will fail due to this. Please combine with other patches when landing. Thanks.
You need to log in before you can comment on or make changes to this bug.