Open Bug 484942 Opened 16 years ago Updated 8 months ago

Spell checker doesn't catch any error with numbers inside words

Categories

(Core :: Spelling checker, defect)

x86
Windows Vista
defect

Tracking

()

People

(Reporter: ehsan.akhgari, Unassigned)

References

(Blocks 1 open bug, )

Details

(Whiteboard: dom-lws-bugdash-triage)

I'm not sure if this is the intended behavior or not, but I found that very strange. See the test URL for an example of the problem: the second word is not recognized as a spelling mistake.
It's only intended behavior if you're l33t. Interestingly, if you run "l33t" directly through Hunspell, it gets flagged as incorrect, but no suggestions are offered. So probably an upstream bug with Hunspell.
Using Vista and thunderbird 24.1.1 British english dictionary 1.19.1 addon In new Write message compose window if I accidentally typed eg: m3thod The spell check does not auto recognise this as a mistake in spelling. If I manually select to check spelling the 'm' is highlighted and if I select 'Ignore' then the 'thod' is highlighted. Obviously, this will draw my attention to the word, but it does not recognise the incorrect spelt word, so will not offer a correct spelling.
See Also: → 621523
Severity: normal → S4

Using Windows 10 and Thunderbird version 91.11.0
Using British English Dictionary (Marco Pinto)
A fork of Mark Tyndall's add-on, based on David Bartlett's British Dictionary R1.19 for Firefox, Thunderbird and SeaMonkey. V3.0.9

Also tested in TRoubleshooting Mode - same result.
Spellcheck as you type is selected.

In Write type: i9ll go f4o a w1lk
No spelling fault auto detected.
Click on 'Spelling' :
If in Troubleshoot mode then nothing appears as mispelt.
In normal start if I select 'Spelling it pops up with lk as mispelt word offering every possible fix except for the fact that it is not wrong.

When I update to 102, I'll check this again.

The behavior is still reproducible in Nightly 104. I'm still not sure this is intended, maybe dminor knows?

Flags: needinfo?(dminor)

I dug into this a little bit, and "words" with numbers in them don't seem to make it to hunspell, so we're filtering them out somewhere in Firefox, maybe when we're segmenting the input text into words? I'll leave the needinfo because it will take me more time to track this down.

This is intentional behaviour [1]. We've been doing this since the original import from CVS to mercurial [2]. Fwiw, I checked in Chromium, and they do spellcheck words like this.

To me, it would make sense if we only skipped the word if it was all digits, but without looking at the CVS history, I'm not sure what the original intention was here. :smaug, what do you think?

[1] https://searchfox.org/mozilla-central/rev/f655bdf6b4bf01b42609750ab94fc37635397260/extensions/spellcheck/src/mozInlineSpellWordUtil.cpp#638.
[2] https://hg.mozilla.org/mozilla-central/file/9b2a99adc05e53cd4010de512f50118594756650/extensions/spellcheck/src/mozInlineSpellWordUtil.cpp#l998.

Flags: needinfo?(dminor) → needinfo?(smaug)

This might be a rather language specific issue and should be handled in the Hunspell level but isn't, at least not too well, and then someone decided to add that check for numbers to hide some common cases where something was incorrectly marked misspelled.

(Testing English here only)
Interestingly '5pm' is marked misspelled in Chrome/linux but not in LibreOffice nor in Firefox. In fact, LibreOffice seems to have the same behavior as Firefox, adding a number to a misspelled word makes it non-misspelled.

Testing some online text processors:
Google doc's spellchecker behaves a bit oddly but I think it never marks a word misspelled if the word starts or ends with a number, but does if there is a number somewhere in middle of the word.
Word 365 doesn't seem to mark any words with numbers in them as misspelled (same behavior as in Firefox and LibreOffice).

I think that seems to hint that the current behavior isn't too unreasonable, at least with English.

Flags: needinfo?(smaug)

Ignoring words starting with numbers means it misses 1th, 2th, 3th, etc., which shows it's still worth to support it in some way.

(In reply to Kagami :saschanaz from comment #9)

Ignoring words starting with numbers means it misses 1th, 2th, 3th, etc., which shows it's still worth to support it in some way.

Agree, these should be allowed number followed by: st, nd, rd, th, am, pm
but, if it's not possible, I would rather have spellcheck flag all numbers that are mixed with letters; independent numbers not flagged; as I can choose to ignore, but if an error occurs and it's not flagged up, then sending a bad spelling is not desirable.
Basically, it is worse to not flag a bad spelling than flag up an acceptable used mix.

Whiteboard: dom-lws-bugdash-triage
You need to log in before you can comment on or make changes to this bug.