Closed Bug 992118 Opened 10 years ago Closed 10 years ago

spell checker has no default dictionary

Categories

(Core :: Spelling checker, defect)

28 Branch
x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: porcelain_mouse, Assigned: ehsan.akhgari)

References

Details

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0 (Beta/Release)
Build ID: 20140319073243

Steps to reproduce:

* create new profile
* go to web-page with free text boxes
* type misspelled word


Actual results:

*  misspelled word is not detected/highlighted.
* Right-click menu shows "Check Spelling" is already selected.
* Right-Click menu > Languages > none of the listed dictionaries is selected (or, the wrong dictionary is selected, i.e. not one matching Language preference).
* When I select the dictionary I want, it works, but it is remembered only for the current URL, which is probably the correct behavior.  (This makes sense, but doesn't help me.)


Expected results:

I would have expected the spell checker to choose the installed dictionary that matches my language preference or locale or environment (LANG variable).  (Although, I notice that the installed dictionary is 'en_US' and the FF Language settings is 'en-US'.)

I tried to get support: https://support.mozilla.org/en-US/questions/992002

I also see people complaining about the opposite behavior, see bug 682564.  But, then again, maybe that is the same bug.  FF default is wrong, but their default is en-US, which what I want, and my "default" is null.  I would be happy if it reset to en-US on every new page; that's exactly what I want.
Component: Untriaged → Spelling checker
Product: Firefox → Core
So what is the expected language in your environment (what is your environment by the way?) and how do you deliver that information to Firefox?  I'm trying to understand what the "my language preference or locale or environment (LANG variable)" exactly means...
See Also: → 992944
Hi Ehsan,

I'm really confused by your question.  I see you are quite active, so I assume the confusion is mine because that phrase is very specific and clear in this context.  I'll try to answer, but if my explaination sounds pedantic, you will have to excuse me.

By "my language preference" I'm referring to the FF language preference, i.e. Preference > Content > Languages > Choose... >.  Here, I haven't changed anything; the default is correct: en-us (aka English/US).

By "environment" I mean the program environment (aka ENV) in which FF is running.  You know, all the variables that FF inherits when it starts running.

One of the environment variables is the LANG variable, which tells all programs what language I want to use.  This is actually part of my POSIX locale, which is what I mean by "locale".  As far as I know, this the way all internationalized programs work.  FF inherits my locale, too, and I assumed was using that to pick my language.

So, in short, the expected language is specified by my locale.

Perhaps all this is beside the point, since all of these things are set correctly, yet FF doesn't pick the right spell checking dictionary.  It doesn't pick any dictionary, actually.  But, I assume it is related because Languages and dictionaries are associated with each other by the locale identifier.  I assume this is not coincidence.

The bug you reference (bug 992944) is quite interesting.  Perhaps that is my problem, but the symptom doesn't match, exactly.  My spellchecker.dictionary pref is set and set correctly, yet it still doesn't work.

I hope this helps.  I'm willing to troubleshoot, so please ask if you have more questions.
Any thoughts?
Sorry for the late response. :(  Please use the needinfo? flag below the comment field and set its value to ":ehsan" without the quotes to get my attention.

(In reply to Porcelain Mouse from comment #2)
> Hi Ehsan,
> 
> I'm really confused by your question.  I see you are quite active, so I
> assume the confusion is mine because that phrase is very specific and clear
> in this context.  I'll try to answer, but if my explaination sounds
> pedantic, you will have to excuse me.

No worries!  I'm basically trying to get a picture of what your system looks like to see if I can figure out what's going wrong.  The code which decides which language to use is quite complicated: <http://mxr.mozilla.org/mozilla-central/source/editor/composer/src/nsEditorSpellCheck.cpp#713> which is why I have to ask these questions.  Sorry if they sound stupid! :-)

> By "my language preference" I'm referring to the FF language preference,
> i.e. Preference > Content > Languages > Choose... >.  Here, I haven't
> changed anything; the default is correct: en-us (aka English/US).

That preference determines what language we send to the server as part of the content negotiation (see the Accept-Languages HTTP header).  It doesn't affect spell checking in any way.

> By "environment" I mean the program environment (aka ENV) in which FF is
> running.  You know, all the variables that FF inherits when it starts
> running.

We do look at the LANG environment variable, but only if we have no preferred spell checking dictionary, but it needs to match exactly with the language of your dictionary.  IOW, en-US and en_US won't be considered to be the same thing here.

Speaking of this, what are the spellchecker.dictionary pref and the LANG environment variable set to respectively?  Note that you originally mentioned that you reproduce this on an empty profile, which would make me believe that spellchecker.dictionary must be empty, but your comment here contradicts that...

> One of the environment variables is the LANG variable, which tells all
> programs what language I want to use.  This is actually part of my POSIX
> locale, which is what I mean by "locale".  As far as I know, this the way
> all internationalized programs work.  FF inherits my locale, too, and I
> assumed was using that to pick my language.

Right, yeah we do have code to look at this environment variable.

> So, in short, the expected language is specified by my locale.
> 
> Perhaps all this is beside the point, since all of these things are set
> correctly, yet FF doesn't pick the right spell checking dictionary.  It
> doesn't pick any dictionary, actually.  But, I assume it is related because
> Languages and dictionaries are associated with each other by the locale
> identifier.  I assume this is not coincidence.

Which website are you using for your testing BTW?  Websites can also specify a language for each text field.

> The bug you reference (bug 992944) is quite interesting.  Perhaps that is my
> problem, but the symptom doesn't match, exactly.  My spellchecker.dictionary
> pref is set and set correctly, yet it still doesn't work.

Are you on Fedora?  (Note that I don't actually know if you are indeed hitting bug 992944...)

> I hope this helps.  I'm willing to troubleshoot, so please ask if you have
> more questions.

Thanks, I may give you test builds and whatnot but let's first see if I can just guess where the bug is. :-)
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #4)

I don't mean to bug you about it.  I'm needinfo?-ing you now, but there's no rush.

> No worries!  I'm basically trying to get a picture of what your system looks
> like to see if I can figure out what's going wrong.  The code which decides
> which language to use is quite complicated:
> <http://mxr.mozilla.org/mozilla-central/source/editor/composer/src/
> nsEditorSpellCheck.cpp#713> which is why I have to ask these questions. 
> Sorry if they sound stupid! :-)

On the contrary, the confusion was in fact mine; that's clear now.  I just didn't pick up on what you were driving at.

> That preference determines what language we send to the server as part of
> the content negotiation (see the Accept-Languages HTTP header).  It doesn't
> affect spell checking in any way.

Ah ha, okay; I didn't realize that.

> We do look at the LANG environment variable, but only if we have no
> preferred spell checking dictionary, but it needs to match exactly with the
> language of your dictionary.  IOW, en-US and en_US won't be considered to be
> the same thing here.

I read that in another bug.  I just don't understand where the incorrect from could be coming from.

> Speaking of this, what are the spellchecker.dictionary pref and the LANG
> environment variable set to respectively?  Note that you originally
> mentioned that you reproduce this on an empty profile, which would make me
> believe that spellchecker.dictionary must be empty, but your comment here
> contradicts that...

LANG=en_US.UTF-8
and
spellchecker.dictionary;en_US

I see that spellchecker.dictionary is set to a non-default value, but I don't remember setting it.  I am confident I didn't set it manually in previous profiles as I just found it recently.  Could it have been set the first time I used the contextual menu Languages option?  I did that for this bug page, only; just to report the bug.  I've been careful not to set it for any other pages so that I don't run out of convenient test pages.

Of course, neither 'en_US' nor the default, null, work.  So, picking the dictionary from my LANG env is also not working for some reason.

> Which website are you using for your testing BTW?  Websites can also specify
> a language for each text field.

I've mainly been using the RedHat bugzilla:

https://bugzilla.redhat.com/

which specifies <html lang="en">

and another internal page that you wouldn't be able to get, which specifies <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us" lang="en-us">

Could I just be unfortunate to hit a lot of pages which specify the wrong language?  Is there a test page I could try that should work for me?  ....Hmm, I just copied the RedHat bugzilla page and removed the 'lang=en' and now that local copy works fine.  Geez.  So, now I'm really interested in this dash instead of an underscore.  From where does that come!?

> > The bug you reference (bug 992944) is quite interesting.  Perhaps that is my
> > problem, but the symptom doesn't match, exactly.  My spellchecker.dictionary
> > pref is set and set correctly, yet it still doesn't work.
> 
> Are you on Fedora?  (Note that I don't actually know if you are indeed
> hitting bug 992944...)

Yes.  Well, now I see the connection to that bug.  So, to what is spellchecker.dictionary supposed to be set on Linux if that is to override these wrong page settings?  I guess that is all I need to know.  Obviously, the LANG patch would be nice, but in the meantime, sounds like what I want is the secret value for this setting.  It can't be en_US, though.

> Thanks, I may give you test builds and whatnot but let's first see if I can
> just guess where the bug is. :-)

Sounds good.
Flags: needinfo?(ehsan)
(In reply to Porcelain Mouse from comment #5)
> > Speaking of this, what are the spellchecker.dictionary pref and the LANG
> > environment variable set to respectively?  Note that you originally
> > mentioned that you reproduce this on an empty profile, which would make me
> > believe that spellchecker.dictionary must be empty, but your comment here
> > contradicts that...
> 
> LANG=en_US.UTF-8
> and
> spellchecker.dictionary;en_US
> 
> I see that spellchecker.dictionary is set to a non-default value, but I
> don't remember setting it.  I am confident I didn't set it manually in
> previous profiles as I just found it recently.  Could it have been set the
> first time I used the contextual menu Languages option?  I did that for this
> bug page, only; just to report the bug.  I've been careful not to set it for
> any other pages so that I don't run out of convenient test pages.
> 
> Of course, neither 'en_US' nor the default, null, work.  So, picking the
> dictionary from my LANG env is also not working for some reason.

Hmm, yeah this _could_ be a problem.  When parsing the LANG environment variable we skip the encoding part (the part past the dot) but we don't attempt to normalize the previous part, so we'll read "en_US" out of that and don't end up matching that with "en-US".  Not sure how your pref is set to "en_US".  Can you try changing it to "en-US" and see if that fixes things?

> > Which website are you using for your testing BTW?  Websites can also specify
> > a language for each text field.
> 
> I've mainly been using the RedHat bugzilla:
> 
> https://bugzilla.redhat.com/
> 
> which specifies <html lang="en">
> 
> and another internal page that you wouldn't be able to get, which specifies
> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us" lang="en-us">

Try <http://people.mozilla.org/~eakhgari/992118.html>?  That is a test page without any lang specifications which I just created.

> Could I just be unfortunate to hit a lot of pages which specify the wrong
> language?  Is there a test page I could try that should work for me? 
> ....Hmm, I just copied the RedHat bugzilla page and removed the 'lang=en'
> and now that local copy works fine.  Geez.  So, now I'm really interested in
> this dash instead of an underscore.  From where does that come!?

I can't tell you where those underscores come from.  :-)

What dictionaries do you have installed?  They should appear under Tools -> Add-ons.  Can you also ls /usr/share/myspell and /usr/lib64/firefox/dictionaries?

> > > The bug you reference (bug 992944) is quite interesting.  Perhaps that is my
> > > problem, but the symptom doesn't match, exactly.  My spellchecker.dictionary
> > > pref is set and set correctly, yet it still doesn't work.
> > 
> > Are you on Fedora?  (Note that I don't actually know if you are indeed
> > hitting bug 992944...)
> 
> Yes.  Well, now I see the connection to that bug.  So, to what is
> spellchecker.dictionary supposed to be set on Linux if that is to override
> these wrong page settings?

The spellchecker.dictionary pref is the fallback, not an override, so we'll prefer what websites tell us over that.

> I guess that is all I need to know.  Obviously,
> the LANG patch would be nice, but in the meantime, sounds like what I want
> is the secret value for this setting.  It can't be en_US, though.

What I would ultimately like you to test is to find out what one of your dictionaries is called, set that pref to that name, and try that people.mozilla.org link again and see if it fixes your issue.  If it does, then there is a good chance that this is a dupe of bug 992944.
Flags: needinfo?(ehsan)
Also, can you please take a look at https://bugzilla.redhat.com/show_bug.cgi?id=439598 and see if your experience matches others' in that bug?
> Hmm, yeah this _could_ be a problem.  When parsing the LANG environment
> variable we skip the encoding part (the part past the dot) but we don't
> attempt to normalize the previous part, so we'll read "en_US" out of that
> and don't end up matching that with "en-US".  Not sure how your pref is set
> to "en_US".  Can you try changing it to "en-US" and see if that fixes things?

So, no, that doesn't help when a page specifies "en-US".  When a page doesn't specify, everything works fine, regardless of how I have it configured.

> Try <http://people.mozilla.org/~eakhgari/992118.html>?  That is a test page
> without any lang specifications which I just created.

Yeah, that seems to work for me.

> I can't tell you where those underscores come from.  :-)

Ah ha!  Now we see the violence inherent in the system! ;-)  As far as I can tell, the underscore is the traditional POSIX separator.  It's been underscore on my system as long as I can remember, which is 15+ years.

> What dictionaries do you have installed?  They should appear under Tools ->
> Add-ons.  Can you also ls /usr/share/myspell and
> /usr/lib64/firefox/dictionaries?

Yeah, I looked that up for the support forum when I reported this problem there.  Here are my files:

/usr/share/myspell/en_US.aff
/usr/share/myspell/en_US.dic

and

/usr/lib64/firefox/dictionaries/en_US.aff
/usr/lib64/firefox/dictionaries/en_US.dic


> The spellchecker.dictionary pref is the fallback, not an override, so we'll
> prefer what websites tell us over that.

Oh, I see..

> What I would ultimately like you to test is to find out what one of your
> dictionaries is called, set that pref to that name, and try that
> people.mozilla.org link again and see if it fixes your issue.  If it does,
> then there is a good chance that this is a dupe of bug 992944.

Well, that page you made works in all cases for me.  If I have spellchecker.dictionary set to "en_US", which is my installed dictionary, it works.  If I have no spellchecker.dictionary setting at all, that page also works.  It works if I have spellchecker.dictionary set to "en-US", too, which isn't a dictionary I have installed.  The problem is when the page specifies "en" or "en-US"; that doesn't work regardless of how I have FF configured.

From what I see on the first couple of links I found searching, it sounds like the POSIX LANG variable is a composite field.  That is, 'en-US' doesn't mean 'en-US', it means language 'en' and country code 'US'.  So, why doesn't the library find a matching, installed dictionary, which I have, when 'en-US' or 'en' in specified?  Isn't there a library call that will connonicalize the name for you?  I'm guessing this separator business is handled internal to some library.

And, it should work on 'en' in any case since that doesn't included any separator business.  Hmm, yeah, I think the delimiter is a red herring; shouldn't 'en' matched since I have one English dictionary installed?
(In reply to Porcelain Mouse from comment #8)
> > Hmm, yeah this _could_ be a problem.  When parsing the LANG environment
> > variable we skip the encoding part (the part past the dot) but we don't
> > attempt to normalize the previous part, so we'll read "en_US" out of that
> > and don't end up matching that with "en-US".  Not sure how your pref is set
> > to "en_US".  Can you try changing it to "en-US" and see if that fixes things?
> 
> So, no, that doesn't help when a page specifies "en-US".  When a page
> doesn't specify, everything works fine, regardless of how I have it
> configured.

That's good to know.

> > I can't tell you where those underscores come from.  :-)
> 
> Ah ha!  Now we see the violence inherent in the system! ;-)  As far as I can
> tell, the underscore is the traditional POSIX separator.  It's been
> underscore on my system as long as I can remember, which is 15+ years.

I'm not trying to pick sides here.  ;-)  I'm just saying that Gecko has never respected these underscores, and it has always used a dash here.  Actually that's a bug I think, we should just accept underscores similar to dashes I think.

Actually looking at the code I do know where the underscore in your pref comes from.  What happens is that we look at LANG, and then try to load a dictionary with that name, which will succeed the first time, and then we stick the name of the loaded dictionary in the pref.  What that breaks is partial language name matching if a website later specifies "en" or "en-US" as the language of their textfield, which seems to match exactly what you're seeing.

So I guess that concludes our investigation!  I'll prepare a patch and a test build for you shortly.
Assignee: nobody → ehsan
Attachment #8414096 - Flags: review?(bugs)
Can you please download the build here <http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/eakhgari@mozilla.com-9a2d82f97e88/try-linux/> and see if it fixes the bug for you?  I'd really appreciate if you could try both with a new profile and with one of these old profiles that is affected by the same issue?  I added code to handle both new LANG environment variables and also to deal with prefs with the underscore in them to take care of both situations.

Thanks!
Jan, please see this bug too, this is reported by a Fedora user, and it seems like the issue here is that we fail to properly handle the underscores in the dictionary names.
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #11)
> Can you please download the build here
> <http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/eakhgari@mozilla.
> com-9a2d82f97e88/try-linux/> and see if it fixes the bug for you?

Yes, I will gladly test it!  Thank so much for banging out this patch so fast.  Wow!  Please give me a day or two to test all cases you requested; then I'll get right back to you.

Again, many thanks.  More soon.
Attachment #8414096 - Flags: review?(bugs) → review+
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(porcelain_mouse)
I'm not sure what I'm supposed to download.  I see source for 32.0a1, is that it?  I don't see any builds there for me; I have Fedora x86_64.
Flags: needinfo?(porcelain_mouse)
Oh, sorry, it has a compiled version.  let me try...
Try: <http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/eakhgari@mozilla.com-9a2d82f97e88/try-linux64/firefox-32.0a1.en-US.linux-x86_64.tar.bz2>

It's not an rpm package, just a tarball which you can extract anywhere on your homedir.  There should be an executable in there called firefox.
Okay, I tried three things:

New profile, with your test page and a page that specifies lang=en and a page that specifies lang=en-US:  all work fine

Old Profile, with your test page and a page that specifies lang=en: all work fine

Old Profile, with spellcheck.dictionary=en_US and your test page and a page that specifies lang=en: all work fine

So, that looks prefect from my perspective.
Thanks a lot for testing, Porcelain Mouse!

http://hg.mozilla.org/integration/mozilla-inbound/rev/d411b8472391
https://hg.mozilla.org/mozilla-central/rev/d411b8472391
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Well this fix doesn't work for me because your test package use internal dictionaries, not system ones. Fedora link 'dictionaries' directory to '/usr/share/myspell' which contains files like en_GB.dic, en_CA.dic, en_US.dic and same .aff files. And here comes problem with underscore. It fails with nsStyleUtil::DashMatchCompare (here: http://mxr.mozilla.org/mozilla-central/source/editor/composer/src/nsEditorSpellCheck.cpp#829 ) because '-' is expected and with '_' it never success. 

So if you have German localized browser, like 'de', the 'de_DE' dictionary is ignored.
Attaching patch which try to find dictionary with '_' and use only '-' as separator in dictionary list.

This actually fix issues in Fedora. Please take a look, I just started a try run with patch: https://tbpl.mozilla.org/?tree=Try&rev=235031248590
Attachment #8435685 - Flags: feedback?(ehsan)
Comment on attachment 8435685 [details] [diff] [review]
Support dictionaries with underscore

Review of attachment 8435685 [details] [diff] [review]:
-----------------------------------------------------------------

Looks mostly good, minusing because of the nsEditorSpellCheck change.

::: editor/composer/src/nsEditorSpellCheck.cpp
@@ +848,5 @@
>          nsString lang = NS_ConvertUTF8toUTF16(env_lang);
>          // Strip trailing charset if there is any
>          int32_t dot_pos = lang.FindChar('.');
>          if (dot_pos != -1) {
> +          lang = Substring(lang, 0, dot_pos);

This change seems unrelated to the problem you're trying to fix, and is wrong.  This breaks dictionaries with names such as "en-US.utf8" right?

::: extensions/spellcheck/hunspell/src/mozHunspell.cpp
@@ +171,5 @@
>    }
>  
>    nsIFile* affFile = mDictionaries.GetWeak(nsDependentString(aDictionary));
> +  if (!affFile) {
> +    nsString replacedStr(aDictionary);

Please add a comment saying something like this:

"Support loading Fedora system dictionaries which use names such as en_US.aff as opposed to en-US.aff"

@@ +309,5 @@
>  static PLDHashOperator
>  AppendNewString(const nsAString& aString, nsIFile* aFile, void* aClosure)
>  {
>    AppendNewStruct *ans = (AppendNewStruct*) aClosure;
> +  nsString replacedStr(aString);

Please add a comment saying "Restore the dictionary name on Fedora (see SetDictionary)"
Attachment #8435685 - Flags: feedback?(ehsan) → feedback-
Hi Ehsan,

Hey, I can't remember why I thought your patch would make it into FF31, but 31 just came to Fedora and I downloaded it with great anticipation.  Did it get incorporated for this release?

Well, FF31 works on my test page with lang=en and lang=en_US.  That's an improvement and I think it's working on pages that it didn't before, like this one, e.g.

But, it doesn't work on pages with lang=en-US.  From what I see in your patch, it looks like trying to make that work, though I could be wrong.  I'm afraid we might not have tested this case.  :-(

Thank you so much for your help on this.  Just wanted to make sure what got implemented was what you intended.  Let me know if you have moment to check up on this with me.
Flags: needinfo?(ehsan)
No sorry this landed for Firefox 32, which will come out in less than 6 weeks now.  So I wouldn't expect any changes to Firefox 31 based on my patch here.
Flags: needinfo?(ehsan)
Oh dear...I have FF32 now and my test page (which is really the test page you created, I just copied it) with lang=en-US still doesn't work.  Let me know if you can look into this with me, again.
Flags: needinfo?(ehsan.akhgari)
(In reply to Porcelain Mouse from comment #25)
> Oh dear...I have FF32 now and my test page (which is really the test page
> you created, I just copied it) with lang=en-US still doesn't work.  Let me
> know if you can look into this with me, again.

Can you please file another bug and include as much information as you can?  I'm pretty sure we fixed this bug, but there might be others to fix.
Flags: needinfo?(ehsan.akhgari)
See Also: → 1097550
You need to log in before you can comment on or make changes to this bug.