Closed Bug 637949 Opened 13 years ago Closed 7 years ago

that'll & that'd are flagged as misspelled

Categories

(Core :: Spelling checker, defect)

x86_64
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox52 --- fixed
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: dholbert, Assigned: RyanVM)

References

Details

(Keywords: regression)

STEPS TO REPRODUCE:
 1. Visit http://pastebin.mozilla.org/
 2. Type "that'll" or "that'd" in the text box

ACTUAL RESULTS: Red underline (flagged as misspelled)
EXPECTED RESULTS: No red underline

Trunk gives me ACTUAL RESULTS
Firefox 3.6.13 gives me EXPECTED RESULTS.

Mozilla/5.0 (X11; Linux x86_64; rv:2.0b13pre) Gecko/20110301 Firefox/4.0b13pre

NOTE: I tried some variations with different base words (it'll/it'd/you'll/you'd) and they all worked as expected.  "that" seems to be the odd one out.
Can you test with 3.6.14 please?
This could be a regression from the hunspell update (bug 579649) that landed for 1.9.2.14...
You're right - this is broken in Firefox 3.6.14.
Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.14) Gecko/20110218 Firefox/3.6.14
Broken with Hunspell 1.3.2 as well.
Nemeth, is this something to be fixed on the hunspell side?
Ehsan, Could you check the English dictionaries? In my platform both of British and American English dictionaries contain these forms, too:

$ grep "'\(d\|ll\)" en_US.dic | tr '\n' ' '
I'd I'll cont'd he'd he'll it'd it'll rec'd she'd she'll somebody'll someone'll spec'd that'd that'll there'd there'll they'd they'll this'll today'll we'd we'll what'd where'd who'd who'll you'd you'll 

But the typographical apostrophe (’ = U+2019) could be a problem. You need correct UTF-8 -> 8-bit conversion for 8-bit dictionaries: U+2019 -> U+0027 (ASCII apostrophe).

But if your English dictionaries are UTF-8 encoded, you have to add these abbreviated forms with typographical apostrophes to the dic file, or the following input conversion to the aff file:

ICONV 1
ICONV ’ '

It's useful to define output conversion, too:

OCONV 1
OCONV ' ’
(In reply to comment #6)
> Ehsan, Could you check the English dictionaries? In my platform both of British
> and American English dictionaries contain these forms, too:
> 
> $ grep "'\(d\|ll\)" en_US.dic | tr '\n' ' '
> I'd I'll cont'd he'd he'll it'd it'll rec'd she'd she'll somebody'll someone'll
> spec'd that'd that'll there'd there'll they'd they'll this'll today'll we'd
> we'll what'd where'd who'd who'll you'd you'll 

ehsanakhgari:~/moz/mozilla-central/extensions/spellcheck/locales/en-US/hunspell [03:53:23]$ grep "'\(d\|ll\)" en-US.dic | tr '\n' ' '
I'd I'll he'd he'll it'd it'll rec'd she'd she'll they'd they'll we'd we'll who'd who'll why'd you'd you'll

Does that mean that this will be fixed by just adding these to the dictionary?

Why was it not broken before the hunspell upgrade we had?  Our en_US.dic has not changed in ages...

> But the typographical apostrophe (’ = U+2019) could be a problem. You need
> correct UTF-8 -> 8-bit conversion for 8-bit dictionaries: U+2019 -> U+0027
> (ASCII apostrophe).

How would we do that?  Are you talking about the usage of U+2019 in en-US.dic?

> But if your English dictionaries are UTF-8 encoded, you have to add these
> abbreviated forms with typographical apostrophes to the dic file, or the
> following input conversion to the aff file:
> 
> ICONV 1
> ICONV ’ '
> 
> It's useful to define output conversion, too:
> 
> OCONV 1
> OCONV ' ’

Is there any drawback to specifying these directives in the affix file anyways?
Mozilla/5.0 (X11; Linux i686; rv:49.0) Gecko/20100101 Firefox/49.0

I have tested this issue on Ubuntu 12.04 x32, Mac OS X 10.11 and Windows 10 x64, with the latest Firefox release (46.0.1) and the latest Nightly (49.0a1-20160531030258) and managed to reproduce it. I have typed "that'll" and "that'd" in the text box and both have been flagged as misspelled.
OS: Linux → All
same as bug 422982 ??
I don't think so.
 (1) Bug 422982 goes away when you change lines (per Bug 422982 comment 1), whereas this bug does not.
 (2) bug 422982 only affects rich text fields & not plain text fields (per bug 422982 comment 10), whereas this bug *does* affect plain text fields.
 (3) Bug 422982 mentions "couldn't" and "isn't" being flagged as misspelled, but those words are fine if I use them with this bug's STR (typing them in at https://pastebin.mozilla.org/ )
I can not reproduce this. When one reports bugs like this it is very important to report the actual dictionary used (en-US SCOWL, en-UK from Marco Pinto etc.) and it's version.

Otherwise it's a shot in the dark to try to reproduce this bug.
The en-US dictionary we ship is a tweaked version of upstream SCOWL. You can see it at:
https://dxr.mozilla.org/mozilla-central/source/extensions/spellcheck/locales/en-US/hunspell

And FWIW, I see both that'd and that'll being flagged as misspelled inside this text box as I write the comment :)
I'm on Ubuntu right now and there Firefox uses the system-wide dictionaries installed via the package manager. Ubuntu packs the old English dictionary from 2007 that can be found on Hunspell sourceforge site. With that dictionary this bug is not reproducible.

I was able to reproduce the bug with en-US SCOWL. So it's a dictionary bug, should be delegated to SCOWL.

Basically

Old dic en_US - OK
Marco   en_GB - OK
SCOWL   en_US - BAD.
Hey Kevin, any thoughts for how to fix this?
Flags: needinfo?(kevin.bugzilla)
Is this https://github.com/en-wl/wordlist/issues/122? At which point we'd need to change WORDCHARS to 0123456789’ IIUC.
No.  The words are missing, that is all.  If you submit an issue I will consider adding them.
Nope, it's probably not that.

Funny enough but Hunspell the library never actually uses WORDCHARS. Those chars are meant to be delegated to the host application so the host app can parse the text considering those.

The command line hunspell uses that field. AFAIK, Mozilla text parser doesn't uses WORDCHARS at all and that is a good thing. From today's perspective, WORDCHARS should be obsoleted and carefully designed text parsing should be put inside the library.

The issue is probably in the affixes.
Dimitrij: I am the author of the upstream wordlist.  The words are not in the upstream word list see: http://app.aspell.net/lookup?dict=en_US;words=that%27ll%0D%0Athat%27d%0D%0A.

They may be added to Mozilla's version of the dictionary, but I do not have time to check.
Flags: needinfo?(kevin.bugzilla)
Depends on: 1333648
Fixed by bug 1333648.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Assignee: nobody → ryanvm
You need to log in before you can comment on or make changes to this bug.