Closed Bug 338427 Opened 18 years ago Closed 13 years ago

Spellchecker should respect @lang

Categories

(Core :: Spelling checker, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla8

People

(Reporter: pascalc, Assigned: arno)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: dev-doc-complete)

Attachments

(4 files, 13 obsolete files)

1.21 KB, text/html
Details
36.24 KB, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
7.22 KB, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
9.48 KB, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
Tested with Bonecho apha 2 on Windows

1 have French dictionnary installed
2 go to a French page and fill in a form like in http://www.chevrelbureau.com/contact/

expected result : the inline spellchecker should detect the page language and use the French dictionnary

actual result : the English dictionnary (presumably last used dictionnary) is used and all words are underlined as mispelt

Firefox should check the presence of the lang="fr" attribute inherited by this textarea field, in the case of this example the page has <html lang="fr>, but if the page had <html lang="fr> and later <textarea lang="en"> (a bilingual blog like glazou's for instance), then the Englishb spellchecker should be used by Firefox.
Are there plans to add this? I thought it was obvious but I just discovered it's not there yet.

Would be a horrible waste of a perfectly good html attribute, if there was a FF version with spellcheck but not knowing how to use it.

blocking-firefox2?
Flags: blocking-firefox2?
At this point, I wouldn't block on this, but it'd be nice to have for those bilingual people out there.
Flags: blocking-firefox2? → blocking-firefox2-
Can someone tell me if there is some spellcheck "initialization" code (hopefully in js?) which selects the dictionary?

I'd like to try to write a loop that iterates parents (in DOM) until it finds the "lang" attribute (or reaches top-level and falls back to last dictionary used).
OK dictionary is selected here: http://lxr.mozilla.org/mozilla1.8/source/editor/composer/src/nsEditorSpellCheck.cpp#189

But it is my understanding that this nsEditorSpellCheck::InitSpellChecker() is called only once per browser instance? Would be really sucky, because this would mean that current dictionary is a per-browser setting and I can't even compose two messages in two languages at the same time?? (with different dictionaries in different windows/tabs/textareas).

:/

Is this correct?
Additionally there could be a check for the top 5 most used words in a certain language in the page, and also the URL could point to a language.
I could not find <html lang= in any page with a comment box or forum that I tried. 
If it can't be implemented hard-coded, hopefully some kind and brainy person will make an extension for this when 2.0 comes out.
*** Bug 344760 has been marked as a duplicate of this bug. ***
(In reply to comment #4)
> But it is my understanding that this nsEditorSpellCheck::InitSpellChecker() is
> called only once per browser instance?

nsEditorSpellCheck is no service and you get one instance per input/textarea element for which spell checking is possible. InitSpellChecker is thus called once per element and would thus be the appropriate place for figuring out the page's default language.

You should be able to start searching from aEditor->GetRootElement. Make sure to consider HTML's lang attribute and xml:lang (whichever is appropriate) and as a bonus it would be nice to also select not completely specified languages (e.g. select es-ES for lang="es").

(In reply to comment #2)
> but it'd be nice to have for those bilingual people out there.

Of which there are an awful lot on the English dominated Internet of a polyglot world. :)
(In reply to comment #7)
> nsEditorSpellCheck is no service and you get one instance per input/textarea
> element for which spell checking is possible. InitSpellChecker is thus called
> once per element and would thus be the appropriate place for figuring out the
> page's default language.
> 
> You should be able to start searching from aEditor->GetRootElement. Make sure
> to consider HTML's lang attribute and xml:lang (whichever is appropriate) and
> as a bonus it would be nice to also select not completely specified languages
> (e.g. select es-ES for lang="es").

Definitely take a look at how the spellcheck attribute support is implemented as well, since this is another case where the editor needs to do the right thing based on page content.  (Although that's much more complex than this, in that it needs to support all kinds of bizarre dynamic behavior and overriding.)
Or it could use the domain suffix.

Example, www.example.se it would use Swedish dictionary, if I goto www.example.it it would use Italian dictionary.
(In reply to comment #9)
> Or it could use the domain suffix.
> 
> Example, www.example.se it would use Swedish dictionary, if I goto
> www.example.it it would use Italian dictionary.
> 

Not at all. There are plenty of countries speaking several official languages (Canada for example) and there are also no reason that a specific site to use an official language of its suffix. Finally, how .com, .org and other non-country related suffixes should be handled?
Finding the language by recursively going up in DOM is the easy part. Having Firefox able to use different languages in different textboxes at the same time doesn't seem to work, though.
I'll give you my 2c.

Whatever language you end up with by looking at the lang variable etc, there should be a button/dropdown somewhere logical where the user can override the setting. After all, not all webmasters know what they are doing..

It should also be easy to install dictionaries for new languages.
(In reply to comment #7)
> Make sure
> to consider HTML's lang attribute and xml:lang (whichever is appropriate) and
> as a bonus it would be nice to also select not completely specified languages
> (e.g. select es-ES for lang="es").

(In reply to comment #9)
> Or it could use the domain suffix.
> 
> Example, www.example.se it would use Swedish dictionary, if I goto
> www.example.it it would use Italian dictionary.

(In reply to comment #12)
> Whatever language you end up with by looking at the lang variable etc, there
> should be a button/dropdown somewhere logical where the user can override the
> setting. After all, not all webmasters know what they are doing..

I've implemented these three points in this extension (my first one):
https://addons.mozilla.org/firefox/3414/

The problem is, I don't know how to change the language for given form fields. Hence you have to reload the page whenever the language is changed manually or automatically (at least when listening to DOMContentLoaded, which fires after the spellcheckers are instantiated). Maybe somebody can help me out here.
*** Bug 357976 has been marked as a duplicate of this bug. ***
*** Bug 358145 has been marked as a duplicate of this bug. ***
(In reply to comment #7)

> 
>  Make sure
> to consider HTML's lang attribute and xml:lang (whichever is appropriate) 

What should be done in case those two attributes are set on the same tags, and are conflicting ?
Ok, this should not happen, but this could happen, and surely already happens on some bad coded pages. 

Is one of those attributes to be picked randomly, or are they to be discarded ?
If so, should search for a lang attribute be continued in parent tags or no ?
(In reply to comment #16)
> What should be done in case those two attributes are set on the same tags, and
> are conflicting ?

When sent as text/html, xml:lang shouldn't receive attention. When sent as XML, lang is obsolete and should at least have a lower priority, if it's not ignored at all.
*** Bug 359709 has been marked as a duplicate of this bug. ***
When I issued bug 359709 I had something else in mind, this bug, however, may be enhanced with those ideas too. 

Here they are (more detailed. My fault not to did so earlier):

We could write code to automatically detect the language. I mean, in the absence of lang and xml:lang attributes - no other information is used besides the text itself. 

There are several techniques that can be used - a simple one: count the letters frequencies in the text. There are lots of paper on this topic. You can search Google Scholar for "Automatic Language Detection" and see our options.

This feature can be used in Composer alone. Composer can automatically (with the user's consent) add the lang or xml:lang as appropriated. This is one of the reason I'd rather reopen bug 359709 so we can keep browser-specific features separated from editor-wide features. Should we?
Attached patch incomplete fix (obsolete) — Splinter Review
Hi,
I tried to tinker a bit with this bug and came with a partial solution.

In nsEditorSpellCheck::InitSpellChecker(), I go from the editor dom element and goes up to the parents node until I find a lang, or xml:lang attribute. If there are none, same behaviour as before is applied : it tries to get language from pref.
Then, there is a loop that looks the names of all directories installed. If there is one that matches exactly the language + subcode requested, get that one. Otherwise, gets the first one that matches the language (first part of full identifier).

My patch still has a few problems.
* At first, this is my first attempt with C++ inside firefox, and my understanding of all that xpcom stuff is far far from perfect. So there are probably problems because of that.
* to choose if I am in a html or xhtml context, I find the content-type of the document. If it is "text/html", then I assume we're in html. I don't check if we're in application/xhtml+xml  because if we insert xhtml inside another xml document (svg or xul for example), content-type is not application/xhtml+xml. So I assume if content-type is not "text/html", then we're in xhtml context. I think that that function is only called from within a html or xhtml editable tag, but may be there is a more robust way to get the context.
* To match the required language against installed dictionaries, I use the same method that is used to match locals with chrome.manifest files. So, I just copy-pasted code because I didn't find a way to avoid duplication.
* That behaviour could be improved to :
- match the shortest code ie : prefer "en" instead of "en-US" if "en-GB" is requested and not available.
- try to match more than the first part of the code ie : prefer "en-GB" instead of "en-US" if "en-GB-oed" is requested and not available
* If "lang" attribute is modified dynamically, another check might be done, and language of spellchecker could be changed if needed
* Last, but may be the more difficult problem : when lang requested is not available, or when lang attribute exists and is empty, spellchecker is set by preferences instead of being disabled. That is because :
- I did not succeed in disabling spellchecker.
- Even if I succeed, if user still wants to enable spellchecker with a language she chooses, nsEditorSpellCheck::InitSpellChecker() would be called again, and spellchecker would be disabled again.

Even if my solution has many flaws, I still post it because may be it will be useful if someone wants to investigates more. Also, if someone wants to point me directions to improve my patch, I can work on it more.
*** Bug 360664 has been marked as a duplicate of this bug. ***
(In reply to comment #5)
> I could not find <html lang= in any page with a comment box or forum that I
> tried. 
Indeed.
So, I was thinking about comparing the spellchecks between the installed dictionaries to find out in which language we're writing.
If dictionary A finds 3 errors when dictionary B finds 1 error, then we would suppose that the dictionary B is the good one for this text.
I don't think it would take much words to find out and this would also work for Thunderbird, which is a plus.
(In reply to comment #22)
> If dictionary A finds 3 errors when dictionary B finds 1 error, then we would
> suppose that the dictionary B is the good one for this text.
> I don't think it would take much words to find out and this would also work for
> Thunderbird, which is a plus.

See the last updates here:
http://en.design-noir.de/mozilla/dictionary-switcher/ (Firefox)
http://en.design-noir.de/mozilla/dictionary-switcher-tb/ (Thunderbird)

The Firefox version is a bit adventuresome, because it's not easy with unlimited textboxes floating around. The Thunderbird version is pretty straightforward though -- I could even write a patch for Thunderbird based on that.
(In reply to comment #23)
> The Thunderbird version is pretty straightforward though -- I could even
> write a patch for Thunderbird based on that.

-> bug 367312
Just FYI, Firefox (and Seamonkey too) has language detection code implemented already.

If you do context click on any element and language has been somehow derived, you will get "Properties" as the last item in the context menu. Selecting this item, a window pops up and you get current language in "Other properties" section of the window.
Attached file Test case
Here's a testcase that demonstrates how this behaviour should work.

(Note that bilingual systems probably don't often use lang tag at present. However maybe they would do if browsers took some notice of it!)

Particularly for language teaching, this would be pretty helpful.
if simple as-you-type dictionary detection is likely to take time to implement, a quick workaround could be to allow the user to set multiple dictionaries at the same time until the real fix be released.

I'm french and i'd be happy to set both en and fr on Firefox since i only use these languages

If you want to try, Midori webbrowser on Linux already allows that
Can't use spell checking in text with 2 or more languages. Please fix this bug either in the Firefox and the Thunderbird.
OS: Windows XP → All
Version: 2.0 Branch → Trunk
(In reply to comment #7)
> (In reply to comment #4)
> > But it is my understanding that this nsEditorSpellCheck::InitSpellChecker() is
> > called only once per browser instance?
> 
> nsEditorSpellCheck is no service and you get one instance per input/textarea
> element for which spell checking is possible. InitSpellChecker is thus called
> once per element and would thus be the appropriate place for figuring out the
> page's default language.
> 
> You should be able to start searching from aEditor->GetRootElement. Make sure
> to consider HTML's lang attribute and xml:lang (whichever is appropriate) and
> as a bonus it would be nice to also select not completely specified languages
> (e.g. select es-ES for lang="es").
> 
> (In reply to comment #2)
> > but it'd be nice to have for those bilingual people out there.
> 
> Of which there are an awful lot on the English dominated Internet of a polyglot
> world. :)

Unfortunately, even if it's not too hard to check the lang that has to be used during the call to nsEditorSpellCheck::InitSpellChecker() (for each editor's instance), the problem is quite hard because there is only one instance of nsISpellChecker (implemented by extensions/spellcheck/) which is used to set the dictionary. In other words, changing the dictionary for one editor will propagate the change for all the editors. That's exactly the behavior you have when you change the language with right click -> Languages -> <language>. And that's exactly what we don't want for @lang. Actually, we may don't want that when changing the language manually too.

On top of my head, I would say the best way to fix this bug would be to set the desired dictionary when the editor got the focus so it could be possible to set the dictionary to the language corresponding to @lang and still be able to revert it when using a field with another @lang or with the default one.
Summary: inline form spellchecking should respect the page/field language → Spellchecker should respect @lang
Attached patch Patch v1 (obsolete) — Splinter Review
Ehsan, could you review the editor changes?
Who would you recommend for spellchecker changes review?

This patch updates the spellchecker's dictionary each time an editor is focused, taking in account @lang. For the moment, @lang has to correspond exactly to the name of one of your dictionary (like 'en-US' or 'fr-FR'). Checking for the first part of the lang code will be a follow-up.

Unfortunately this patch comes with one regression: if the user change the dictionary for one field (using right click -> languages -> <dictionary>) this will be forgotten as soon as he/she focus another field. I think the current behavior is really annoying (this is saved as long as you don't 'revert' it). If that can be done in a follow-up, I would prefer to have this saved to be used by default for the entire document and not the entire session.

By the way, my previous comment had some mistakes. There isn't one instance of nsISpellChecker but on for each editor. Actually, I think that's the spellcheck engine service but I will not bet on it...
Assignee: nobody → mounir.lamouri
Status: NEW → ASSIGNED
Attachment #461208 - Flags: review?(ehsan)
You should probably use the algorithm in HTML5: <http://www.whatwg.org/html/#language>
Yes, you forgot on @xml:lang.
I'm concerned that since the majority of the web is en-US that this feature will constantly select the en-US dictionary, effectively rendering my en-CA dictionary useless.

It annoys me when I'm see red squiggly lines under words I and everyone else north of the border considers to be spelt correctly. 

Does to page or text field have to explicitly specify a language to trigger this feature? Is it possible for this feature to, say, select to top ranking en-* when the page specifies en-US?
Maybe the spell-checker could modify @lang of current input box in DOM each time user changes the spelling language. This would avoid regression from comment #35 and Ian from comment #38 would became irritated on each page reload only.
(In reply to comment #35)
> Created attachment 461208 [details] [diff] [review]
> Patch v1
> 
> Ehsan, could you review the editor changes?

Sure.

> Who would you recommend for spellchecker changes review?

Smaug maybe?

> This patch updates the spellchecker's dictionary each time an editor is
> focused, taking in account @lang. For the moment, @lang has to correspond
> exactly to the name of one of your dictionary (like 'en-US' or 'fr-FR').
> Checking for the first part of the lang code will be a follow-up.

Have you filed that bug?

> Unfortunately this patch comes with one regression: if the user change the
> dictionary for one field (using right click -> languages -> <dictionary>) this
> will be forgotten as soon as he/she focus another field. I think the current
> behavior is really annoying (this is saved as long as you don't 'revert' it).
> If that can be done in a follow-up, I would prefer to have this saved to be
> used by default for the entire document and not the entire session.

You mean it will be forgotten for the same field, or that it would just revert for the other fields?

> By the way, my previous comment had some mistakes. There isn't one instance of
> nsISpellChecker but on for each editor. Actually, I think that's the spellcheck
> engine service but I will not bet on it...

There is one instance of the spell checker component per editor, and there is one editor per plaintext fields (input & textarea) and one per any document which is in design mode or has contenteditable element(s).
Comment on attachment 461208 [details] [diff] [review]
Patch v1

>+NS_IMETHODIMP
>+nsEditorSpellCheck::UpdateCurrentDictionary(nsIEditor* aEditor)
>+{
>+  nsresult rv;
>+  nsXPIDLString dictName;
>+  PRBool dictHasBeenSet = PR_FALSE;
>+
>+  nsCOMPtr<nsIDOMElement> rootElmt;
>+  rv = aEditor->GetRootElement(getter_AddRefs(rootElmt));
>+  NS_ENSURE_SUCCESS(rv, rv);

I don't see how this is going to work.  This root element is the anonymous div for plaintext fields, which doesn't have any parent, and doesn't have the lang attribute set.

>+  // We do a ascendant search into the tree to found a node with @lang specified

Nit: s/found/find/.

>+  // beginning with the root element. As soon as a @lang value is found (even if
>+  // the empty string), we stop.
>+  for (nsCOMPtr<nsIContent> curElmt = do_QueryInterface(rootElmt);
>+       curElmt && curElmt->GetAttr(kNameSpaceID_None, do_GetAtom("lang").get(),
>+                                   dictName);
>+       curElmt = curElmt->GetParent());

Let's use the HTML5 algorithm here.

>+void
>+nsEditor::OnFocus(nsIDOMEventTarget* aFocusEventTarget)
>+{
>+  InitializeSelection(aFocusEventTarget);
>+
>+  if (mInlineSpellChecker) {
>+    mInlineSpellChecker->UpdateCurrentDictionary();
>+  }
>+}

Why do we have to update the current dictionary on focus?  Specifically, I don't think that it's good to hit the pref service once per focus.  And what happens if you change the lang attribute while a control is not focused?


Also, is it possible to get some tests for this?
Attachment #461208 - Flags: review?(ehsan) → review-
(In reply to comment #38)
> Does to page or text field have to explicitly specify a language to trigger
> this feature? Is it possible for this feature to, say, select to top ranking
> en-* when the page specifies en-US?

IMO correct behaviour for this feature would be:

1. Use user default language if the page/element does not specify language (like for example this bugzilla page).

2. If the page or element does specify language, with the full code (en-US) then that language should be used if installed, otherwise fall back to en-whatever.

3. If the page specifies the short code (en) then the system should use your top installed language that matches en-.

Relatively few pages (about 10% according to some old stats from google I found) specify lang at all, and I'm sure very much fewer give a specific country code, so I think it's unlikely to cause big problems. I.e. the majority of the web is not 'en-US'. The majority of the web is probably '' with a relatively small minority being 'en' and an even tinier one being 'en-US'.
(In reply to comment #42)
> 2. If the page or element does specify language, with the full code (en-US)
> then that language should be used if installed, otherwise fall back to
> en-whatever.

I'm going to have to vehemently disagree with this as it embodies a set of values not common outside of the US. There is a difference between agreeing to converse in a specified language and being force to conform to someone else's variant of it.  Again I will state that I would be most annoyed if my Canadian English dictionary was overridden on English pages for *any* reason other than my own.  I'm sure the same sentiment prevails among Québécois (vs French) and Schweizerdeutsch (vs German) and ...

I believe the more respectful behaviour would be: If the page or element does specify language, with the full code (en-US) or code (en), then the system should use your
top installed language that matches en-.
(In reply to comment #43)
> I'm going to have to vehemently disagree with this as it embodies a set of

Well, you can if you like, but this particular 'set of values' came from outside the US (although I don't claim to be common).

- Is there a reason why your proposal to ignore the hyphenated part is more technically correct somehow? Seems to me the system should behave in the best [most accurate] way it can manage with regard to the page content unless there is an overwhelming reason why not (abuse, etc).

- Have you ever seen a page which specifies lang="en-US"? I never have. Admittedly, I didn't go looking, but when I do see or write language tags (for English), they're 'en' only, so it doesn't seem to me to be common practice.

- Do you actually have the en-us dictionary installed? I don't. If you don't have it, the 'correct behaviour' as proposed won't (obviously, can't) make you use it. So you'd have to voluntarily install the language of your despised overlords in order to be negatively affected by this behaviour...

- Changing a default spellchecker language is hardly 'forcing'; apart from just ignoring it, you'd still be able to change it on the page.

- Doesn't your argument apply just as much to totally different languages i.e. this whole feature entirely (why are you 'forcing' people to speak in english in this forum if they want to speak Japanese! - answer, nobody is, it's just a spellchecker default they can change).

- There may be other languages where the difference is more significant (perhaps a matter of ease of understanding, in some cases, rather than national pride) and it's desirable to have a spellchecker switched to the variant you are expected to be typing in.

My reason for commenting on this bug is that I have built systems for language teaching and we want to make textboxes where students can type in answers to questions in French. These are (mainly) English students so they obviously don't want to change their browser default or find out that it's stuck in French next time they go type into a forum (elsewhere on the web, or even elsewhere on our site). I'm pretty sure there are use cases like this for being specific about particular language variants too (e.g. a course that is teaching French French and you'll lose marks in the exam if you use Quebec French, or vice versa), even though it will only affect people who have installed those dictionaries.
I have stated my case clearly and succinctly.  I'll let it stand.
(In reply to comment #43)
> I believe the more respectful behaviour would be: If the page or element does
> specify language, with the full code (en-US) or code (en), then the system
> should use your
> top installed language that matches en-.

That should rather be a preference option as to which language variant is to be (forcefully) preferred. Such a preference should be settable on a dialog listing all available dictionaries just as the default dictionary to use if not appropriate one is installed (that might also be spellcheck disabled!)

But that's an enhancement to the browser selecting the spellcheck language based on @lang attribute of page language header (or meta tag) when there was no @lang attribute.
Blocks: 660506
Attached patch updated patch (obsolete) — Splinter Review
updated patch.
This patch choose dictionary  according to editor element @lang. @lang is computed with html5 algorithm http://www.whatwg.org/specs/web-apps/current-work/multipage/#language
If no @lang is set, try to use last selected dictionary, or dictionary for current locale.

If there is no dictionary for required language (for example, language is pt-BR), try to use dictionary for "generic language" (ie: pt). If this does not work, try to use any dictionary matching language (ie: for example, pt-PT).

If no dictionary was found. Either:
- if no @lang was set, try to get en-US or any available dictionary.
- if some @lang was set, do not set a default dictionary.
I think it makes sense to disable dictionaries instead of enabling a wrong dictionary. If I understand correctly, that would also fix #660506

Also, with this patch, default dictionary is not stored automatically in prefs anymore. It's only stored when user sets it manually (otherwise, it would have been set to the lang of last visited page, which is probably wrong).

It does not support @lang modification. Ideally, it would be nice to have following behaviour when @lang attribute is changed by script:
 - if user has changed spellcheck lang manually, don't do anything.
 - otherwise, sets spellcheck lang to the new computed one.
But that means I must observe attributes change, creation and deletion. I'm not sure if it's worth it. What are the usecase of changing lang attribute for some element ?

Also, I couldn't write test for this patch because I don't known how to add a dictionary at runtime. If I don't want to do that, what are the other possibilities for testing ?
Attachment #244939 - Attachment is obsolete: true
Attachment #547702 - Flags: review?(ehsan)
Comment on attachment 547702 [details] [diff] [review]
updated patch

Review of attachment 547702 [details] [diff] [review]:
-----------------------------------------------------------------

In general I like this patch.  I'm minusing because of the GetLanguage part, but the rest generally looks good.

About testing strategies, you can load dictionaries from a directory using the loadDictionariesFromDir API (see <http://mxr.mozilla.org/mozilla-central/source/extensions/spellcheck/hunspell/tests/unit/test_hunspell.js#212> for an example).  But I would be fine with tests written without loading custom dictionaries, if we use the same code for this and the :lang CSS selector (see below).  Once we do that, we can get more unit tests for the :lang selector (this is our only test for that now: <http://mxr.mozilla.org/mozilla-central/source/layout/reftests/unicode/unicode-lang.html?force=1>) and then we can trust that editor is going to use the same code to retrieve the language name, which reduces the need for loading additional dictionaries in our tests.  But we should still get tests to make sure that the stuff in other languages is not spell-checked by the editor.  You can use the existing spellcheck tests in <http://mxr.mozilla.org/mozilla-central/source/layout/reftests/editor> as a sample for that.

Once we have this level of testing, we can just load some simple dictionaries for some fictional languages (such as testing, testing-AB, testing-AC) to make sure that they're not going to affect other tests in the system, and use them to test the fallback language stuff.

For dynamic changes, let's see if bz has any ideas how to handle them efficiently, but I think that we definitely want to handle them, because it's the only way that a web page can signal us to change the spell checking language.  But this can definitely be done in a follow-up bug.

Does this make sense?

::: editor/composer/src/nsEditorSpellCheck.cpp
@@ +251,4 @@
>      }
>    }
>  
> +  // If we have not set editor, and editor has not a @lang, we try to get a

Nit: "If the editable element doesn't have a lang attribute, ..."

@@ +283,5 @@
>  NS_IMETHODIMP    
> +nsEditorSpellCheck::GetLanguage(nsIEditor* aEditor, nsAString &aResult)
> +{
> +  nsCOMPtr<nsIDOMElement> rootElement;
> +  nsresult rv = aEditor->GetRootElement(getter_AddRefs(rootElement));

This only work correctly for the plaintext editor.  For HTML editor, you need to call GetActiveEditingHost() (you should probably cast to nsHTMLEditor manually :( ), and use the @lang attribute on that element if it exists, and walk up the parent chain if it doesn't.

@@ +289,5 @@
> +
> +  nsCOMPtr<nsIContent> rootContent = do_QueryInterface(rootElement);
> +  NS_ENSURE_TRUE(rootContent, NS_ERROR_FAILURE);
> +
> +  nsCOMPtr<nsIContent> parent = rootContent->GetParent();

So, we have some code here <http://mxr.mozilla.org/mozilla-central/source/layout/style/nsCSSRuleProcessor.cpp#1613> (also see the GetLang function in that file) which implements the same thing for the :lang CSS selector.  I think we should consolidate all this into an nsIContent method named GetLanguage or something, and then we can simply call this function in both places.
Attachment #547702 - Flags: review?(ehsan) → review-
> For dynamic changes, let's see if bz has any ideas how to handle them
> efficiently

So the goal here is that you want to be notified when the language of some particular content node changes, right?
(In reply to comment #50)
> > For dynamic changes, let's see if bz has any ideas how to handle them
> > efficiently
> 
> So the goal here is that you want to be notified when the language of some
> particular content node changes, right?

Yes.
We don't have an existing notification system that would work for that.  We'd need to add something.

That said, you only care about this for things that have frames, no?  Is there a reason that GetStyleVisibility()->mLanguage (which is somewhat different, in that I think it ignores Content-Language but does look at charset and ignores xml:lang, but we could fix those differences if we needed to) won't work for you?  That has automatic change notifications built in, sorta.
(In reply to comment #52)
> We don't have an existing notification system that would work for that. 
> We'd need to add something.
> 
> That said, you only care about this for things that have frames, no?  Is
> there a reason that GetStyleVisibility()->mLanguage (which is somewhat
> different, in that I think it ignores Content-Language but does look at
> charset and ignores xml:lang, but we could fix those differences if we
> needed to) won't work for you?  That has automatic change notifications
> built in, sorta.

But we need to be notified when the language changes, in order to be able to change our dictionary and trigger another spell check round.
Right; if the style property is good enough we could easily add a notification when it changes.
Attached patch wip (obsolete) — Splinter Review
wip patch:
it deals with ehsan suggestions except for the html editor handling (ie: GetActiveEditingHost method).
What needs to be done is call GetActiveEditingHost at the right moment (editor needs to be focused otherwise, result is wrong). ehsan suggested on irc to make GetActiveEditingHost a [noscript,notxpcom] method of nsIHTMLEditor.
Attachment #547702 - Attachment is obsolete: true
(In reply to comment #55)
> Created attachment 548559 [details] [diff] [review] [review]
> wip
> 
> wip patch:
> it deals with ehsan suggestions except for the html editor handling (ie:
> GetActiveEditingHost method).
> What needs to be done is call GetActiveEditingHost at the right moment
> (editor needs to be focused otherwise, result is wrong). ehsan suggested on
> irc to make GetActiveEditingHost a [noscript,notxpcom] method of
> nsIHTMLEditor.

There can be more than one contenteditable element per editor. Therefore getting this part done looks quite tricky. Computing editor lang cannot be done during initialization. It should be done at spell checking time. Or, as volkmar proposed in comment #35, at focus time.
(In reply to comment #56)
> (In reply to comment #55)
> > Created attachment 548559 [details] [diff] [review] [review]
> > wip
> > 
> > wip patch:
> > it deals with ehsan suggestions except for the html editor handling (ie:
> > GetActiveEditingHost method).
> > What needs to be done is call GetActiveEditingHost at the right moment
> > (editor needs to be focused otherwise, result is wrong). ehsan suggested on
> > irc to make GetActiveEditingHost a [noscript,notxpcom] method of
> > nsIHTMLEditor.
> 
> There can be more than one contenteditable element per editor. Therefore
> getting this part done looks quite tricky. Computing editor lang cannot be done
> during initialization. It should be done at spell checking time. Or, as volkmar
> proposed in comment #35, at focus time.

Doing that on focus makes sense to me...
(In reply to comment #54)
> Right; if the style property is good enough we could easily add a notification
> when it changes.

If we modify it to match the HTML5 spec, that would probably work fine.
The hard part there is xml:lang, since it needs to apply to all elements and not just the ones that do attribute mapping...
(In reply to comment #59)
> The hard part there is xml:lang, since it needs to apply to all elements and
> not just the ones that do attribute mapping...

Then maybe we can defer this part to another bug.
Well, yes, but if that'll just never work we might as well not even start down this route.

David, what do you think of the above bits about making the style system's concept of "language" take into account xml:lang?
Attached patch patch v2 (obsolete) — Splinter Review
new patch.
dictionary language is updated each time editor is focused. To implement this, I mainly copied stuff from volkmar's patch. The main difference is I initiate a full spell check (with SpellCheckRange) if language has changed.
Otherwise, first focus in a contenteditable area does not check spell (but second focus does).
Attachment #548559 - Attachment is obsolete: true
Attachment #548731 - Flags: review?(ehsan)
PRUnichar *previousDictionary = nsnull; 
nsDependentString previousDictionaryStr;
if (NS_SUCCEEDED(mSpellCheck->GetCurrentDictionary(&previousDictionary))) {
    previousDictionaryStr.Assign(previousDictionary);
}

I wonder if there's a better way to do that. May be nsEditorSpellCheck::GetCurrentDictionary could return a nsAString instead PRUnichar** ?
Comment on attachment 548731 [details] [diff] [review]
patch v2

Review of attachment 548731 [details] [diff] [review]:
-----------------------------------------------------------------

The patch looks very good, r=me on it with the comments below addressed.

Thanks a lot for working on this!

::: editor/composer/src/nsEditorSpellCheck.cpp
@@ +188,5 @@
>    rv = mSpellChecker->SetDocument(tsDoc, PR_TRUE);
>    NS_ENSURE_SUCCESS(rv, rv);
>  
> +  // do not fail if UpdateCurrentDictionary fails because this method may
> +  // success later.

Nit: succeed

::: layout/reftests/editor/338427-ref.html
@@ +1,4 @@
> +<!DOCTYPE html>
> +<html>
> +<body>
> +    <textarea lang="testing-XX">strangeimpossibleword</textarea>

Technically, the -ref file is the reference rendering, and the other file is the thing to be tested.  So, these two files should probably be swapped.

::: layout/reftests/editor/reftest.list
@@ +62,5 @@
>  == 642800.html 642800-ref.html
>  == selection_visibility_after_reframe.html selection_visibility_after_reframe-ref.html
>  != selection_visibility_after_reframe-2.html selection_visibility_after_reframe-ref.html
>  == 672709.html 672709-ref.html
> +== 338427.html 338427-ref.html

Can you add another test here for the dynamic change case?  You can put lang="en-US" on the textarea, then upon load change it to "testing-XX", then .focus() and .blur().

Also, can you please add equivalent tests for contenteditable?
Attachment #548731 - Flags: review?(ehsan) → review+
Blocks: 674685
Attached patch patch v2.1 (obsolete) — Splinter Review
(In reply to comment #64)
> Nit: succeed

fixed

> Technically, the -ref file is the reference rendering, and the other file is
> the thing to be tested.  So, these two files should probably be swapped.

fixed


> Can you add another test here for the dynamic change case?  You can put
> lang="en-US" on the textarea, then upon load change it to "testing-XX", then
> .focus() and .blur().

do you mean adding commented tests for bug #674685 ?

> Also, can you please add equivalent tests for contenteditable?

Fixed. This was slightly tricky because it looks like spellcheck="disable" is not supported on contenteditable. So, the reference document contains a non editable area. Then, in the test document, I must first focus the contenteditable, to update dictionary. Then, blur to compare with non editable reference document.
But if I focus then blur, I don't capture a focus event in editor listener. Neither if I set a 0 or small timeout. So, I've set a 1000 timeout. I known this is fragile, but I didn't find a better solution.
Assignee: mounir → arno
Attachment #548731 - Attachment is obsolete: true
> Can you add another test here for the dynamic change case?  You can put
> lang="en-US" on the textarea, then upon load change it to "testing-XX", then
> .focus() and .blur().


Actually, this does not work. Calling SetCurrentDictionary to an invalid language, does nothing. So an invalid language at startup (or first focus) does not initialize spellchecker. But an invalid language after spellchecker has been initialized to a language does not reset it.
(In reply to comment #66)
> > Can you add another test here for the dynamic change case?  You can put
> > lang="en-US" on the textarea, then upon load change it to "testing-XX", then
> > .focus() and .blur().
> 
> Actually, this does not work. Calling SetCurrentDictionary to an invalid
> language, does nothing. So an invalid language at startup (or first focus) does
> not initialize spellchecker. But an invalid language after spellchecker has
> been initialized to a language does not reset it.

Hmm, well, nevermind those tests then, I guess.  :(
Attached patch patch v2.2 (obsolete) — Splinter Review
(In reply to comment #67)
> (In reply to comment #66)
> > > Can you add another test here for the dynamic change case?  You can put
> > > lang="en-US" on the textarea, then upon load change it to "testing-XX", then
> > > .focus() and .blur().
> > 
> > Actually, this does not work. Calling SetCurrentDictionary to an invalid
> > language, does nothing. So an invalid language at startup (or first focus) does
> > not initialize spellchecker. But an invalid language after spellchecker has
> > been initialized to a language does not reset it.
> 
> Hmm, well, nevermind those tests then, I guess.  :(

A possibility is to make SetCurrentDictionary handle empty string by disabling spell check. Then, calling SetCurrentDictionary with empty string at start of UpdateCurrentDirectionary. Can you review this ?
Attachment #548933 - Attachment is obsolete: true
Attachment #549230 - Flags: review?(ehsan)
Comment on attachment 549230 [details] [diff] [review]
patch v2.2

Something is wrong with the patch. I'll work on it again later.
Attachment #549230 - Flags: review?(ehsan)
Comment on attachment 549230 [details] [diff] [review]
patch v2.2

Sorry, I was mistaken
Attachment #549230 - Flags: review?(ehsan)
Comment on attachment 549230 [details] [diff] [review]
patch v2.2

Review of attachment 549230 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the below comments fixed.

::: content/base/public/nsIContent.h
@@ +948,5 @@
> +   * Determing language. Look at the nearest ancestor element that has a lang
> +   * attribute in the XML namespace or is an HTML element and has a lang in
> +   * no namespace attribute.
> +   */
> +  void GetLang(nsAString& aResult) {

Oh, I almost forgot about this!  Make this a const method please?

::: layout/reftests/editor/338427-2.html
@@ +6,5 @@
> +    editor.focus();
> +    window.setTimeout(function() {
> +        editor.blur();
> +        document.documentElement.className = '';
> +    }, 1000); // XXX: 0 timeout is not enought to trigger a focus event.

Change this to:

setTimeout(function() {
  setTimeout(function() {
    editor.blur();
    document....
  }, 0);
}, 0);

::: layout/reftests/editor/338427-3.html
@@ +7,5 @@
> +    editor.focus();
> +    window.setTimeout(function() {
> +        editor.blur();
> +        document.documentElement.className = '';
> +    }, 1000); // XXX: 0 timeout is not enought to trigger a focus event.

Same thing here.
Attachment #549230 - Flags: review?(ehsan) → review+
(In reply to comment #71)
> ::: layout/reftests/editor/338427-2.html
> @@ +6,5 @@
> > +    editor.focus();
> > +    window.setTimeout(function() {
> > +        editor.blur();
> > +        document.documentElement.className = '';
> > +    }, 1000); // XXX: 0 timeout is not enought to trigger a focus event.
> 
> Change this to:
> 
> setTimeout(function() {
>   setTimeout(function() {
>     editor.blur();
>     document....
>   }, 0);
> }, 0);
> 
> ::: layout/reftests/editor/338427-3.html
> @@ +7,5 @@
> > +    editor.focus();
> > +    window.setTimeout(function() {
> > +        editor.blur();
> > +        document.documentElement.className = '';
> > +    }, 1000); // XXX: 0 timeout is not enought to trigger a focus event.
> 
> Same thing here.

Err, why isn't this using focus event listeners?
Attached patch patch v2.3 (obsolete) — Splinter Review
(In reply to comment #71)

Updateded patch with nsIContent::GetLang a const method, and with reftests using focus listeners.

> r=me with the below comments fixed.

Should I mark this patch as r+ myself then ?
Attachment #549230 - Attachment is obsolete: true
Keywords: checkin-needed
This patch doesn't build on try:
http://tbpl.mozilla.org/?tree=Try&rev=3795680ed63f

Actually, it looks like only debug builds do not build (at least, for the moment).
Keywords: checkin-needed
Attached patch patch v2.4 (obsolete) — Splinter Review
Build failure was a typo.

about http://tinderbox.mozilla.org/showlog.cgi?log=Try/1312356145.1312356574.28744.gz:
I modified test_bug484181.html to focus content editable area before checking suggestion
about http://tinderbox.mozilla.org/showlog.cgi?log=Try/1312344377.1312345911.12741.gz:
I modified subtst_contextmenu.html to add lang="en-US" to spellchecked elements. I'm not sure if that's enough
About http://tinderbox.mozilla.org/showlog.cgi?log=Try/1312336710.1312338332.9722.gz:
I don't known what to do.

I also found and fixed another bug:
when there is no language, CheckWords returns a failure error. So, I must check return code.

Can someone push this patch to try server now ?
Attachment #549553 - Attachment is obsolete: true
Failing mochitest-1 tests on MacOS X and Windows. In addition, it's failing reftests:
http://tbpl.mozilla.org/?tree=Try&rev=1af7ee004fb3
Check this for checking several languages simultaneously
https://bugzilla.mozilla.org/show_bug.cgi?id=660506
Attached patch patch v2.5 (obsolete) — Splinter Review
I fixed mochitest by focusing contenteditable at test starts.

Reftests failed because setting lang="testing-XX" on textarea changed default font in some configurations. I fixed this by explicitly defining a font-family.

Attached patch passes tests correctly:
http://tbpl.mozilla.org/?tree=Try&rev=2c9a8d9708d1

except on android because of a timeout. What can be the cause of this timeout ?
http://tinderbox.mozilla.org/showlog.cgi?log=Try/1312534392.1312536112.31421.gz
Attachment #550415 - Attachment is obsolete: true
Attachment #551030 - Flags: review?
It looks like most problems that users report with repeatedly choosing ONE spell checking language would be satisfactorily solved if the user could once for all setup SEVERAL dictionaries to be used SIMULTANEOUSLY for the languages he can write, among those installed on his system.  That is especially needed for e-mail since many people write them in several languages (and that's the way Evolution does it).
It seems fairly simple to do, see  https://bugzilla.mozilla.org/show_bug.cgi?id=676500




I'm not sure what is being made in these patches, but be sure not to override the user's setting for the spell checker language(s), especially if, one day hopefully, it will be set to several languages like fr_FR,en_US,en_GB,ru_RU.
Automatic detection should be done only if the user has set spell check language setting to "automatic".

On the other hand:
1) is it because a page is displayed in French that a bug report requested to be in English should be spell checked in French?
2) is text necessarily in a single language?  Looking at the e-mails I most often write, the answer is a definite NO.

3) isn't it much easier and sufficient that the user set once for all the set of all the languages he can write?

BTW: was all text entered in this thread spell checked ?  ;-)
(In reply to André Pirard from comment #79)
> 1) is it because a page is displayed in French that a bug report requested
> to be in English should be spell checked in French?

A page may state that some input fields are expecting French, some other fields on the same page/form are expecting English.
This would especially be the case on pages that allow input of translations or text localized in multiple languages.
Like say content-editor for some CMS.

> 2) is text necessarily in a single language?  Looking at the e-mails I most
> often write, the answer is a definite NO.
> 
> 3) isn't it much easier and sufficient that the user set once for all the
> set of all the languages he can write?

IMHO when user's global selection of language(s) for spell checking applies should get weighted against the kind of form the user is on. When individual input fields carry language tag spell checker should default to exactly that language.
When there is just language set via HTTP header or document's lang tag on root node it could be auto-detected between user's known languages and page language.
Though this should then be left to the user to decide.
> BTW: was all text entered in this thread spell checked ?  ;-)

Clearly the OP used a French spelling dictionary when originally reporting this bug (^:

Note that the majority of the non-English sites I come across seem to specify English as their language.  Developers are not always aware that there is a setting that they need to change, and some tools are hard-coded to output a template which specifies English, so that the site owner cannot override it even if they wanted to.  Hence there needs to be (a) a manual override for the spelling dictionary; and / or (b) perhaps a heuristic to determine what language a page "really" is in, to cope with the real world as well as the W3C specifications.  I suppose (b) needs to be a separate bug report, once this one is fixed and deployed.
I just wrote:
> there needs to be (a) a manual override for the spelling dictionary; and/or
> perhaps a heuristic to determine what language a page "really" is in

Or bug 676500 should be fixed instead of this one, or as well.  +1 for the other bug as the preferred solution.
(In reply to era eriksson from comment #81)
> Note that the majority of the non-English sites I come across seem to
> specify English as their language.  Developers are not always aware that
> there is a setting that they need to change, and some tools are hard-coded
> to output a template which specifies English, so that the site owner cannot
> override it even if they wanted to.

Awareness:  I once wrote to a university to alert them that the title page of their rector, administrator and more stated "vacancies" ;-) 

And we're speaking not only of a single global language setting but of the hope that language attributes will be defined for fields or paragraphs (please fix your page so that I can type in).
Taking Google Translation as an example of knowhow, they try to translate <code> when they certainly shouldn't and I never saw any page adding notranslate to code (except mine ;-) ).

So, we must choose between relying on the above fuzziness  to determine language or of relying on the user's notion of what's the language he's typing.

(In reply to era eriksson from comment #82)
> Or bug 676500 should be fixed instead of this one, or as well.  +1 for the
> other bug as the preferred solution.

Yes, I think that bug 676500 seems simple enough to implemented first. Then we can post a message in the 30 or so (including duplicates) bugs describing an issue with choosing the language to ask if a problem remains.

I have used multi-language spell checking with Eudora for many years and I can testify it's delightful. The only drawback could be that some words exist in one language that can allow a mistake in another. But, having used the French+English pair for which the risk is much higher than generally, I can tell it's a smaller nuisance that is by far counterbalanced by the advantage of being able to use foreign (English) terms or phrases in a French text.
Is it better to use heuristics to have words determine a dictionary to be used to check those words?

Bug 676500 is primarily targeted at e-mail, but that's the same spelling checker, issues and reply, isn't it.

The other improvement to speed up spell checking is to determine what is not a word (all-uppercase? long strings, esp with digits and special characters, <code>, fixed width? etc...).

I have quoted this reply to bug 676500.
See you there, maybe.
Attachment #551030 - Flags: review? → review?(ehsan)
(This is basically an expanded version of the argument in comment 80, which I'm agreeing with but have one more use case.)

Multi-language spell checking could be good as a user default but there are two use cases where it is problematic, both of which are 'page-defined':

1) Language teaching. The user is English but is learning French. We want to enable the French spellchecker (that doesn't accept English words) in a text field which is defined for French language entry even though their default language is English. As far as possible, this should work without them having to configure anything. I realise that probably won't work because they'll have to at least install the dictionary, but...

2) Multi-lingual content entry. The user may be English or French and ordinarily would prefer a spellchecker dictionary that accepts both, but on this occasion is entering content for a multilingual website. The form contains two text areas, one for English and one for French. The English one should show French spellings as errors and the French one should show English spellings as errors. Again this should work without them having to configure anything.

If there is concern about pages that are incorrectly defined, then one possible heuristic to spot 'the page designer really means it' cases would be to see if the lang or xml:lang attribute is explicitly defined on the textarea (or other spellcheck-required) tag itself rather than a higher-up tag.

In other words, say the page has lang="en" due to some broken authoring tool, but the user's preferred language is French (or French + English + Spanish, in the new wonderful multi-spellchecker world) then perhaps the user preference should always override the page language. But if there is a specific <textarea lang="en"> on the page then it should default to the page-specified language (English only on this case), allowing the user a right-click override.
(In reply to s.marshall from comment #84)
> Multi-language spell checking could be good as a user default but there are
> two use cases where it is problematic, both of which are 'page-defined':
> ...

In other words, the list of multi-languages should contain an option or dummy language called, say, "as_requested".
If the user keeps it turned on by default, it means that he accepts his set of languages being disabled and the sole requested language being used for just the course of filling a prescribed field.
If he disagrees with some request, he may temporarily disable the dummy (well, or just make a spelling mistake ;-))
This option appearing as a dummy language makes it apparent when the user fumbles in his languages for a problem.

I would not let the language definition of a page, division etc... control input.
That's because it's not necessarily the intention of the page author, and probably so little often the case that the user may find himself disabling that option most of the time and finally turning it of permanently.
I would restrict that feature to language applied to input fields specifically or to a new language definition specifically made for input fields.

Multi-language spelling has been thought of principally for e-mail but I'd say I'm glad to drop my eurocent in your Web discussion if that word were in your dictionary ;-)
Don't forget to pay a visit to bug 676500, though.
(In reply to Dão Gottwald [:dao] from comment #72)
> (In reply to comment #71)
> > ::: layout/reftests/editor/338427-2.html
> > @@ +6,5 @@
> > > +    editor.focus();
> > > +    window.setTimeout(function() {
> > > +        editor.blur();
> > > +        document.documentElement.className = '';
> > > +    }, 1000); // XXX: 0 timeout is not enought to trigger a focus event.
> > 
> > Change this to:
> > 
> > setTimeout(function() {
> >   setTimeout(function() {
> >     editor.blur();
> >     document....
> >   }, 0);
> > }, 0);
> > 
> > ::: layout/reftests/editor/338427-3.html
> > @@ +7,5 @@
> > > +    editor.focus();
> > > +    window.setTimeout(function() {
> > > +        editor.blur();
> > > +        document.documentElement.className = '';
> > > +    }, 1000); // XXX: 0 timeout is not enought to trigger a focus event.
> > 
> > Same thing here.
> 
> Err, why isn't this using focus event listeners?

Because spellchecking doesn't happen on focus, we post an event to the event loop and start the spellchecking when that event is processed.
Folks, *please* do not comment on this bug in order to discuss other bugs.  This distracts the people who are working on this bug (like me and Arno) and also makes the discussion impossible to find for somebody who's looking at the other bug.
Comment on attachment 551030 [details] [diff] [review]
patch v2.5

>diff -r 61bb2bb510c9 editor/libeditor/html/tests/test_bug484181.html
> function runTest() {
>   gMisspeltWords = ["haz", "cheezburger"];
>+  var edit = document.getElementById("edit");
>+  edit.focus();
>+  SimpleTest.executeSoon(function() {
>   is(isSpellingCheckOk(), true, "All misspellings before editing are accounted for.");
> 
>-  var edit = document.getElementById("edit");
>   append(" becaz I'm a lolcat!");
>   SimpleTest.executeSoon(function() {
>   gMisspeltWords.push("becaz");
>   gMisspeltWords.push("lolcat");
>   is(isSpellingCheckOk(), true, "All misspellings after typing are accounted for.");
> 
>   SimpleTest.finish();
>   });
>+  });
> }

Nit: could you please adjust the indentation here?

>diff -r 61bb2bb510c9 extensions/spellcheck/src/mozInlineSpellChecker.cpp
>+    if (NS_FAILED(rv))
>+      continue;

Why is this change needed?

>diff -r 61bb2bb510c9 layout/reftests/editor/338427-2.html
>--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
>+++ b/layout/reftests/editor/338427-2.html	Thu Aug 04 23:34:57 2011 +0200
>@@ -0,0 +1,16 @@
>+<!DOCTYPE html>
>+<html class="reftest-wait">
>+<script>
>+function init() {
>+    var editor = document.getElementById('editor');
>+    editor.addEventListener("focus", function() {
>+        editor.blur();
>+        document.documentElement.className = '';
>+    }, false);
>+    editor.focus();

As I said before, this is wrong.

Also, it's probably a good idea for you to run the reftests manually to verify that these tests path.  You can do this with this command, for example:

make -C /path/to/objdir reftest TEST_PATH=layout/reftests/editor/reftest.list


>diff -r 61bb2bb510c9 toolkit/content/InlineSpellChecker.jsm
>-    if (! spellchecker.CheckCurrentWord(this.mMisspelling))
>-      return 0;  // word seems not misspelled after all (?)
>+    try {
>+      if (! spellchecker.CheckCurrentWord(this.mMisspelling))
>+        return 0;  // word seems not misspelled after all (?)
>+    } catch(e) {
>+        return 0;
>+    }

Why is this change needed?

>-    var curlang = spellchecker.GetCurrentDictionary();
>+    var curlang = "";
>+    try {
>+        curlang = spellchecker.GetCurrentDictionary();
>+    } catch(e) {}

Also, why is this one needed?
Attachment #551030 - Flags: review?(ehsan)
I run the test on my system before submitting a patch either here, or on try server. Tests can succeed on my machine but fail in other environments. For example, as I told in comment 78, this can be caused by different font settings.

About, when the blur is made inside double timeout, updateCurrentDictionary is not called.

> Why is this change needed?

because when editor has an unkown language, there is no language set, and CheckWords returns a failure error.

Thanks for review and commiting the patch :)
(In reply to comment #89)
> I run the test on my system before submitting a patch either here, or on try
> server. Tests can succeed on my machine but fail in other environments. For
> example, as I told in comment 78, this can be caused by different font
> settings.

Hmm, I was talking about this failure http://tinderbox.mozilla.org/showlog.cgi?log=Try/1312534392.1312536112.31421.gz in comment 78.  But I just remembered that Android can't handle focus correctly yet.  So, can you just mark the test as needs-focus in the reftest manifest?  That should make Android skip the test.

> About, when the blur is made inside double timeout, updateCurrentDictionary is
> not called.

Why?

> > Why is this change needed?
> 
> because when editor has an unkown language, there is no language set, and
> CheckWords returns a failure error.

Then can we just not attempt to do a spell check in that case?
Attached patch patch v2.6 (obsolete) — Splinter Review
> But I just remembered that Android can't handle focus
> correctly yet.  So, can you just mark the test as needs-focus in the reftest
> manifest?  That should make Android skip the test.

What do you mean with: can't handle focus correctly ?
Because, if I understand correctly, needs-focus just check if content if out-of-process
http://hg.mozilla.org/mozilla-central/file/a0e3c589c8fa/layout/tools/reftest/reftest.js#l859
Is this the cause focus fails on Android, or should the directive be skip-if(Android) instead ?

I've investigated more about the CheckWords call

> >-    var curlang = spellchecker.GetCurrentDictionary();
> >+    var curlang = "";
> >+    try {
> >+        curlang = spellchecker.GetCurrentDictionary();
> >+    } catch(e) {}
> 
> Also, why is this one needed?

this code is executed when selecting a dictionary in context menu. This can
happen even when no dictionary is active. So this check is strictly needed.

> >-    if (! spellchecker.CheckCurrentWord(this.mMisspelling))
> >-      return 0;  // word seems not misspelled after all (?)
> >+    try {
> >+      if (! spellchecker.CheckCurrentWord(this.mMisspelling))
> >+        return 0;  // word seems not misspelled after all (?)
> >+    } catch(e) {
> >+        return 0;
> >+    }
> 
> Why is this change needed?
> 

I think it's less needed, because this.mOverMisspelling should not be true if
there is no misspelled word. But then, spellchecker.CheckCurrentWord is as much
unneeded as the try/catch around it.


> Then can we just not attempt to do a spell check in that case?

Good idea. In attached patch, I return early from
mozInlineSpellChecker::ResumeCheck if there is not dictionary available.

> >diff -r 61bb2bb510c9 extensions/spellcheck/src/mozInlineSpellChecker.cpp
> >+    if (NS_FAILED(rv))
> >+      continue;
> 
> Why is this change needed?

Then, it should be less needed. But isn't it better to check for functions
results. (This call may return with NS_ERROR_OUT_OF_MEMORY for example)

Also, this iteration fixes indentation nit.

> > About, when the blur is made inside double timeout, updateCurrentDictionary is
> > not called.
> 
> Why?

I don't known exactly why.
But with something like that *in a reftest*, the focus listener is not
triggered, event if wrapping the setTimeout inside another setTimeout.

    var editor = document.getElementById('editor');
    editor.addEventListener("focus", function() {
        // not reached in reftest
    }, false);
        window.setTimeout(function() {
            editor.blur();
        }, 0);

And we need to trigger the focus listener because that's when UpdateCurrentDictionary is called. Otherwise, test seem to pass, but for the wrong reason: because no dictionary has been set.

> Because spellchecking doesn't happen on focus, we post an event to the event
> loop and start the spellchecking when that event is processed.

So, what I tried to do was using setTimeout inside the focus handler

    var editor = document.getElementById('editor');
    editor.addEventListener("focus", function() {
        window.setTimeout(function() {
            editor.addEventListener("blur", function() {
                document.documentElement.className = '';
            }, false);
            editor.blur();
        }, 0);
    }, false);
    editor.focus();

Unfortunately, this fails on windows platform
http://tinderbox.mozilla.org/showlog.cgi?log=Try/1312952687.1312955723.27999.gz
Actually, I already spent more time than initially desired on this issue, and I
really don't want to try again to guess what's wrong on a platform I don't even
have access to. So, I'll currently stop working and this bug. If someone wants
to step in and fix that issue, and the possible other ones remaining, that
would be nice; otherwise, I may resume on this later.
Attachment #551030 - Attachment is obsolete: true
Status: ASSIGNED → NEW
Try run for 49167a17b8af is complete.
Detailed breakdown of the results available here:
    http://tbpl.mozilla.org/?tree=Try&rev=49167a17b8af
Results (out of 223 total builds):
    success: 202
    warnings: 18
    failure: 3
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/arno@renevier.net-49167a17b8af
Attached patch patch v2.7Splinter Review
I fixed the reftest with kaze and volkmar help. We came to a better solution:
make the test and the reference exactly similar except the reference will have spellcheck=false. Then, there's no need anymore to blur the editor.

I also modified another thing since last patch:


> > Then can we just not attempt to do a spell check in that case?
> 
> Good idea. In attached patch, I return early from
> mozInlineSpellChecker::ResumeCheck if there is not dictionary available.
> 

Actually, I also need to cleanup previously underline words, by calling RemoveRange. Otherwise, a word may stay underlined, even if editor has changed @lang to a unknown one.
That case is coved by test-3

I've tested attached patch on tryserver, and it looks like it's correct:
http://tbpl.mozilla.org/?tree=Try&rev=3adb6f0528ac
Attachment #552033 - Attachment is obsolete: true
Attachment #552655 - Flags: review?(ehsan)
Comment on attachment 552655 [details] [diff] [review]
patch v2.7

Review of attachment 552655 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great!  Thanks a lot for your hard work on this.  :-)
Attachment #552655 - Flags: review?(ehsan) → review+
Landed on mozilla-inbound.
Component: General → Spelling checker
Flags: blocking-firefox2-
Keywords: dev-doc-needed
Product: Firefox → Core
QA Contact: general → spelling-checker
Target Milestone: --- → mozilla8
Attachment #461208 - Attachment is obsolete: true
http://hg.mozilla.org/mozilla-central/rev/f3f7872db0ae
Status: NEW → RESOLVED
Closed: 13 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
> SetCurrentDictionary(NS_LITERAL_STRING("").get());

Can we fix that signature, pretty please?
Blocks: 678842
Oups. Look like there's a serious bug in this patch :(
When user manually sets a non default dictionary, it's dismissed as soon as editor is blured and focused again (because we go the UpdateCurrentDictionary path again).
quick wip patch. I'll try to add some tests tomorrow.
patch to fix issue described in comment 98.
This works by setting a boolean member mDictWasSetManually to PR_TRUE when SetCurrentDictionary is called, but not from UpdateCurrentDictionary
Once that member is true, UpdateCurrentDictionary (and SaveDefaultDictionary) will do nothing.
This patch also fixes a typo in a variable name (s/currentDictonary/currentDictionary/)
Attachment #553025 - Attachment is obsolete: true
Attachment #553143 - Flags: review?(ehsan)
(In reply to Ms2ger from comment #97)
> > SetCurrentDictionary(NS_LITERAL_STRING("").get());
> 
> Can we fix that signature, pretty please?

Here is a patch changing  SetCurrentDictionary and GetCurrentDictionary to use nsAString& instead of PRUnichar*
Attachment #553145 - Flags: review?(ehsan)
I've tested both patches on try server, and they look correct
http://tbpl.mozilla.org/?tree=Try&rev=8300b6949f82
Comment on attachment 553145 [details] [diff] [review]
fix Get/SetCurrentDictionary signature

Some nits:

>--- a/editor/composer/src/nsEditorSpellCheck.cpp	Mon Aug 15 09:56:38 2011 +0200
>+++ b/editor/composer/src/nsEditorSpellCheck.cpp	Mon Aug 15 10:26:10 2011 +0200
>+nsEditorSpellCheck::GetCurrentDictionary(nsAString& aDictionary)
> {
>+  nsresult rv = mSpellChecker->GetCurrentDictionary(aDictionary);
>   return rv;

"return mSpellChecker->GetCurrentDictionary(aDictionary);"

> nsEditorSpellCheck::SaveDefaultDictionary()
> {
>   if (!mDictWasSetManually) {
>     return NS_OK;
>   }
>-  PRUnichar *dictName = nsnull;
>-  nsresult rv = GetCurrentDictionary(&dictName);
> 
>-  if (NS_SUCCEEDED(rv) && dictName && *dictName) {
>-    rv = Preferences::SetString("spellchecker.dictionary", dictName);
>-  }
>+  nsAutoString dictName;
>+  nsresult rv = GetCurrentDictionary(dictName);
>+  NS_ENSURE_SUCCESS(rv, rv);
> 
>-  if (dictName) {
>-    nsMemory::Free(dictName);
>-  }
>-
>+  rv = Preferences::SetString("spellchecker.dictionary", dictName);
>   return rv;

Same comment, and I think you're changing behaviour here if dictName is the empty string (can that happen?)

>@@ -501,65 +485,65 @@ nsEditorSpellCheck::UpdateCurrentDiction
>-  SetCurrentDictionary(NS_LITERAL_STRING("").get());
>+  SetCurrentDictionary(NS_LITERAL_STRING(""));

EmptyString()

>--- a/extensions/spellcheck/src/mozInlineSpellChecker.cpp	Mon Aug 15 09:56:38 2011 +0200
>+++ b/extensions/spellcheck/src/mozInlineSpellChecker.cpp	Mon Aug 15 10:26:10 2011 +0200
>@@ -1755,36 +1755,27 @@ nsresult mozInlineSpellChecker::KeyPress
> }
> 
> NS_IMETHODIMP mozInlineSpellChecker::UpdateCurrentDictionary()
> {
>+    previousDictionary.Assign(NS_LITERAL_STRING(""));

"previousDictionary.Truncate();"

>+    currentDictionary.Assign(NS_LITERAL_STRING(""));

And here

But overall, this looks much better!
Attachment #553145 - Flags: feedback+
patch for the Get/SetCurrentDictionary signature, resolving issues raised my Ms2ger.


> Same comment, and I think you're changing behaviour here if dictName is the
> empty string (can that happen?)

If SetCurrentDictionary has been called with empty string, GetCurrentDictionary will return an error.
Attachment #553145 - Attachment is obsolete: true
Attachment #553150 - Flags: review?
Attachment #553145 - Flags: review?(ehsan)
Comment on attachment 553143 [details] [diff] [review]
part2: do not override manually set dictionary

>+  PRBool mUpdateDictionaryRunning;
>+  PRBool mDictWasSetManually;

Please use PRPackedBool.

>+class UpdateDictionnaryHolder {
>+  private:
>+    nsEditorSpellCheck* mSpellCheck;
>+  public:
>+    UpdateDictionnaryHolder(nsEditorSpellCheck* esc): mSpellCheck(esc) {
>+      if (mSpellCheck) {
>+        mSpellCheck->BeginUpdateDictionary();
>+      }
>+    }
>+    ~UpdateDictionnaryHolder() {
>+      if (mSpellCheck) {
>+        mSpellCheck->EndUpdateDictionary();
>+      }
>+    }
> };

Also, please move this class to the cpp file.
Attachment #553143 - Flags: review?(ehsan) → review+
Comment on attachment 553150 [details] [diff] [review]
part3: fix Get/SetCurrentDictionary signature

This really belongs to another bug...  but r=me nevertheless!
Attachment #553150 - Flags: review? → review+
(In reply to Ehsan Akhgari [:ehsan] from comment #105)
> Comment on attachment 553143 [details] [diff] [review]
> part2: do not override manually set dictionary
> 
> >+  PRBool mUpdateDictionaryRunning;
> >+  PRBool mDictWasSetManually;
> 
> Please use PRPackedBool.
> 
> >+class UpdateDictionnaryHolder {
> >+  private:
> >+    nsEditorSpellCheck* mSpellCheck;
> >+  public:
> >+    UpdateDictionnaryHolder(nsEditorSpellCheck* esc): mSpellCheck(esc) {
> >+      if (mSpellCheck) {
> >+        mSpellCheck->BeginUpdateDictionary();
> >+      }
> >+    }
> >+    ~UpdateDictionnaryHolder() {
> >+      if (mSpellCheck) {
> >+        mSpellCheck->EndUpdateDictionary();
> >+      }
> >+    }
> > };
> 
> Also, please move this class to the cpp file.


Patch fixes those 2 stuffs.
Comment on attachment 552655 [details] [diff] [review]
patch v2.7

>+  PRUnichar *currentDictionary = nsnull;
>+  rv = mSpellCheck->GetCurrentDictionary(&currentDictionary);
Fortunately you fixed this leak in attachment 553150 [details] [diff] [review]!
(In reply to Boris Zbarsky (:bz) from comment #61)
> David, what do you think of the above bits about making the style system's
> concept of "language" take into account xml:lang?

We should do that.  It's bug 234485.
(In reply to arno renevier from comment #98)
> Oups. Look like there's a serious bug in this patch :(
> When user manually sets a non default dictionary, it's dismissed as soon as
> editor is blured and focused again (because we go the
> UpdateCurrentDictionary path again).

Should this bug be fixed in latest Nightly (9.0a1 (2011-08-17))? I'm still seeing it.
Depends on: 682564
Is this in Nightly or Aurora? It's flagged as Gecko 8, but someone documented it as Gecko 9. I'd like to confirm which it is.
(In reply to Eric Shepherd [:sheppy] from comment #112)
> Is this in Nightly or Aurora? It's flagged as Gecko 8, but someone
> documented it as Gecko 9. I'd like to confirm which it is.

It's currently in Aurora, but backout from gecko8 is in process. See bug 684467.
This got written up here:

https://developer.mozilla.org/en/HTML/Controlling_spell_checking_in_HTML_forms#Controlling_the_spellchecker_language

And it's listed on Firefox 9 for developers. Firefox 9? ZOMG!
Bugzilla should set a lang="en" to take advantage of this.
Could you please file the bug (there is a bugzilla Product) and reference this bug there? Thanks.
Depends on: 697981
In which Firefox version is this supposed to be fixed? It still does not work in FF 11.
Michael, this was fixed as of Firefox 8.

If you have a testcase that shows it not working in Firefox 11, please file a bug and cc me?
I've checked it again, and the behaviour is not really consistent. Sometimes it changes the language according to the lang attribute, and sometimes it does not. Is it possible that FF does not really select the language, but the dictionary that was last selected manually on the same page? And sometimes spellchecking is automatically turned off, and I have to turn it back on manually. This is very odd.
If you have steps to reproduce, please file a new bug and we'll discuss it there.
Depends on: 940619
Still FF 31.0 spell checker doesn't respect lang attributes in textareas:

<!DOCTYPE html>
<html lang="en">
<head>
	<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
	<title>Textareas with different lang attributes</title>
</head>
<body>
	<form>
		<textarea lang="it" rows="5" cols="40" spellcheck="true">Questa casella di testo contiene del testo in italiano</textarea>
		<br>
		<textarea lang="en" rows="5" cols="40" spellcheck="true">This textarea contains english text</textarea>
	</form>
</body>
</html>

Whit this code the spell checker will be set to English language on both the textareas, even the one that is clearly marked as lang=it; if I right click on the Italian textarea and choose Italian as language, BOTH the textareas will turn to Italian language! 

Even if you remove the lang attribute on the html tag (which in my opinion shouldn't be necessary, since it correctly states that the text of the page is in English), the behaviour is the very same.

Compare this behaviour with what IE10 does: it correctly uses a different dictionary for each textarea; Chrome doesn't support more that one dictionary ad a time, so it's impossible it could work correctly.
This bug is about the page language only.  Please file a separate bug on using different spellchecking dictionaries for different parts of the page.
From what I read in the first post of this bug, it refers EXACTLY to the fact that the browser should respect BOTH the default page language AND the lang attribute specified in each textarea...

And that is exactly the bug I was talking about!
Yes, but this bug was then morphed into just following the general-page-specified language, and fixed in that form.  Which is why followup work should happen in a separate bug.
So this turned into fixing a different bug? :)

Ok, I'll post another bug report with the very same test case!
See Also: → 1184249
Depends on: 1209220
You need to log in before you can comment on or make changes to this bug.