Closed Bug 318040 Opened 19 years ago Closed 9 years ago

Spell checker flags words containing full stops (periods)

Categories

(Core :: Spelling checker, defect)

x86
Windows XP
defect
Not set
minor

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: u63580, Assigned: nemeth, NeedInfo)

References

Details

Attachments

(1 file)

If I have an e-mail with, for example "etc." in, the spell checker flags it as mis-spelled.  I then get suggested "etc." by th spell checker: changing it gives "etc.." in my e-mail.

I'm using TB version 1.5 (20051120), but I assume this is Core -> Spell checker.

I think this must be a dupe, but I can't find it.  Am I being stupid?
A bit more fiddling about reveals this is only a problem with the en-GB dictionary, not en-US (obviously I'm in the UK!).  So I assume this is dictionary-related?
Summary: Spell checker flags words ending in full stops (periods) even if in dictionary → Spell checker flags words ending in full stops (periods) in en-GB dictionary
Component: Spelling checker → en-GB / English, United Kingdom
Product: Core → Mozilla Localizations
Version: 1.4 Branch → unspecified
Have updated component
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee: mscott → nobody
QA Contact: spelling-checker
This is not only a problem with en-GB, but also with de-DE.

Abbreviations like 'evtl.' and 'usw.' are treated the same way.

So this is probably not dictionary-related but localization-related (assuming that en-US doesn't involve any localization).
(In reply to comment #3)
> This is not only a problem with en-GB, but also with de-DE.
> 
> Abbreviations like 'evtl.' and 'usw.' are treated the same way.
> 
> So this is probably not dictionary-related but localization-related (assuming
> that en-US doesn't involve any localization).
> 

Could well be.  I have raised the issue with OpenOffice, and will see what I get from them, just to be on the safe side.
This bug was reported on mozdev way back in 2002: http://bugzilla.mozdev.org/show_bug.cgi?id=2075

It is still unresolved. 

It is a problem in the implementation of the spelling checker. The reason why it does not occur with en_US is that there are no words that include a full stop in them in the en_US dictionary. 

The same problem was fixed a long time ago in OpenOffice.org - it (certainly the version 2 series) handles words (abbreviations) containing full stops correctly.

David.
Okay, this is therefore Core - Spell checker.  The problem isn't the dictionary, but is that en-US doesn't reveal the bug as there are no full stops in it!  I'm surprised there is no core bug on this, but I can't see one.  So I'll move the component.
Assignee: nobody → mscott
Component: en-GB / English, United Kingdom → Spelling checker
Product: Mozilla Localizations → Core
QA Contact: spelling-checker
Summary: Spell checker flags words ending in full stops (periods) in en-GB dictionary → Spell checker flags words ending in full stops (periods)
Version: unspecified → Trunk
Please note that the problem is not only with words ending in full stops (dot on the line), but those where the dot is part of the abbreviation: for example i.e. and others.
Altered summary to reflect more general problem
Summary: Spell checker flags words ending in full stops (periods) → Spell checker flags words containing full stops (periods)
The continued presence of this bug is not great.  It does not look good in localisations at all, and now of course shows up in FF as well as TB (fill in a field in a 2.0 build using, for example, en-GB to see it).  Any idea when we might see something happening here - 2.0, 3.0, ... ?
Assignee: mscott → nobody
This is another instance of tokenizer problems (Bug 355178).

The simple fix is to change the en-GB and de dictionaries to do what the en-US dictionary does

include "etc"

not "etc."

The cost is that it will falsely allow "etc" as a word. But that seems more acceptable than calling "etc." a misspelling and changing it to "etc.."
Depends on: 355178
This isn't only a problem with periods, but with any internal punctuation.  For example, here is what I noted in bug 362453 comment #0:

> I have also encountered a similar, possibly related, problem with hyphenated
> words that are manually added to the dictionary.  For example, I added the word
> "E-Bru" (the name of my band) [to the dictionary], but when I type it Firefox 
> highlights "Bru" as misspelt (and in this case "E-Bru" is not listed as 
> an alternative).

It is possible that this may be a separate issue, in which case I am happy to log a new bug for it.  If it is part of the same problem then the summary needs updating to reflect that.
Hi Mark, the hyphen issue has already been filed as bug 466127.
Thanks - I hadn't spotted that.  Bug 466127 has been resolved as a duplicate of bug 355178, and in THAT bug there is discussion that may be relevant to this issue.  It regards the tokenizers in both Hunspell and the various Mozilla products that use it, and how they have been/need to be modified in order to handle hyphenation correctly.  

The cause of this bug seems to be the same thing (incorrect tokenization) but I am unable to tell from the details in bug 355178 whether the proposed fix would also resolve the issue here.
I can't believe this bug has been reported in 2005 and still is not solved. I am using Firefox 10.0.2 with de-DE dictionary. It still marks "etc." as wrong, highlighting "etc" and suggests to substitute it with "etc.", which leads to "etc..", with an again highlighted "etc".
I confirm that this bug is still with us, and affects all platforms :-(

BTW, for en_US there is the opposite bug: many words wrongly containing a "." are *not* marked as misspelled, for example: "w.rt." (where only "w.r.t." should be okay).
A workaround: remove the trailing ".", accept the remaining word (e.g., "etc") to your personal dictionary, and re-add the "." to your text.
Of course, the drawback is that any "etc" without a following "." is not flagged.
Comment on attachment 8467800 [details] [diff] [review]
hunspell_mozilla_abbreviations.patch

Ehsan, how does this look to you?
Attachment #8467800 - Flags: review?(ehsan)
Comment on attachment 8467800 [details] [diff] [review]
hunspell_mozilla_abbreviations.patch

Review of attachment 8467800 [details] [diff] [review]:
-----------------------------------------------------------------

Sure!
Attachment #8467800 - Flags: review?(ehsan) → review+
I'm glad that this issue meanwhile appears fixed (just tried with TB 31.3.0) - thanks!

I noticed just a very minor issue now: anything of the form 'x.y.z.' (i.e. any sequence of single letters with periods in between) is never marked as misspelled. I presume that this is because all single letters (followed by a space) are considered fine.
For instance, the English abbreviation 'w.r.t.' is fine, while 'w.x.t.' should be flagged as wrong.
Abbreviations (both predefined in dictionaries and user-defined ones in persdict.dat) like "etc." are not any more flagged as error - but leaving out the period, like in "etc", is not flagged as error, either. 
This is likely because the period is not stored in the dictionary. Thus, not much of a gain by now.
Looks like this r+ patch never landed. What's the status of this bug?
Flags: needinfo?(ehsan)
(In reply to aleth [:aleth] from comment #28)
> Looks like this r+ patch never landed. What's the status of this bug?

I don't know.  Ryan?
Flags: needinfo?(ehsan) → needinfo?(ryanvm)
Bug is still alive and kicking, just as it was a decade(!) ago.
Keywords: checkin-needed
Nobody ever requested checkin AFAICT. That said, this came up in discussion in #maildev today and that led to discussion w/ Nemeth about trying to get an updated Hunspell release out soon that would include this fix.

Thinking about it more, though, I guess one advantage to just landing the patch here is that it would make for an easier backport to the release branches, since I'm guessing TB38 might want this.

https://hg.mozilla.org/integration/mozilla-inbound/rev/b931d9f7c644
Flags: needinfo?(ryanvm)
Keywords: checkin-needed
I just checked again with the currently released TB version 31.6.0:
My rather positive comments of December 2014 were referring to Mac OS,
while in my Windows machine, the bug is still present.
https://hg.mozilla.org/mozilla-central/rev/b931d9f7c644

This will be in tomorrow's Firefox and Thunderbird nightlies if people want to test it out. Would be nice to get verification that things work as expected, then we can discuss uplifting to Aurora/Beta for 38/39 as well.
Assignee: nobody → nemeth
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
I'm not sure if this is fixed for all cases, here are my results:

On Windows 7 64bit, using x64 Nightly en-GB. 

Name: Firefox
Version: 40.0a1
Build ID: 20150419030206
Update Channel: nightly
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:40.0) Gecko/20100101 Firefox/40.0
Multiprocess Windows: 0/1 (default: false)

Testing a few examples (in the Left column).


Testing	       Result on Fx 39 (32bit)  Result on Fx 40 (64bit) Pass/Fail
List of words  Old behaviour            New behaviour

etc.           "etc." is Red            is OK                   Pass for "etc."

w.r.t.         "w." is Red,             still as Fx 39          need to check
               "r.t." is OK

i.e.           "i." is Red,             still as Fx 39          need to check
               "e." is OK

e.g.           is OK                                            was OK in Fx 39

sic.           is OK                                            was OK in Fx 39

w.x.t.         "w." is Red,             still as Fx 39          need to check
               "x.t." is OK
(see comment # 26)

B.B.C.         is OK                                            was OK in Fx 39

scot-free      is OK                                            was OK in Fx 39

color          is Red                                           still Red - good
               (good test of en-GB)

Conclusion, it may need more tests.
Possibly need to add some Latin words to MY en-GB Dictionary?

DJ-Leith
Words containing 'Q'

There are Total 2525 words containing this word. List of all words Containing Q are listed below categorized upon number of words.  

Example : Counterquestioning, Hemidemisemiquaver, Inconsequentiality, Consequentialities

http://wordmaker.info/containing/q.html
Flags: needinfo?(wordmaker82)
I am using Thunderbird 47.1 and EN-GB dictionary. I get the i of i.e. flagged as a misspelling. Also it doesn't ignore all email addresses which is irritatin as they get flagged as misspelt.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: